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Abstr act. We propose a number of techniques for obtaining a global ranking 
from data that may be incomplete and imbalanced — characteristics that are 
almost universal to modern datasets coming from e-commerce and internet 
(jJQ applications. We are primarily interested in cardinal data based on scores or 

ratings though our methods also give specific insights on ordinal data. Prom 
raw ranking data, wc construct pairwisc rankings, represented as edge flows 
^ on an appropriate graph. Our statistical ranking method exploits the graph 

Helmholtzian, which is the graph theoretic analogue of the Hclmholtz operator 
or vector Laplacian, in much the same way the graph Laplacian is an ana- 
logue of the Laplace operator or scalar Laplacian. We shall study the graph 
I I Helmholtzian using combinatorial Hodge theory, which provides a way to un- 

ravel ranking information from edge flows. In particular, we show that every 
edge flow representing pairwise ranking can be resolved into two orthogonal 
components, a gradient flow that represents the Zg-optimal global ranking and 
a divergence-free flow (cyclic) that measures the validity of the global ranking 
obtained — if this is large, then it indicates that the data docs not have a 
lyij good global ranking. This divcrgcncc-frce flow can be further decomposed or- 

I I thogonally into a curl flow (locally cyclic) and a harmonic flow (locally acyclic 

but globally cyclic); these provides information on whether inconsistency in 
the ranking data arises locally or globally. 
^ When applied to statistical ranking problems, Hodge decomposition sheds 

light on whether a given dataset may be globally ranked in a meaningful way 
or if the data is inherently inconsistent and thus could not have any reasonable 
global ranking; in the latter case it provides information on the nature of the 
inconsistencies. An obvious advantage over the NP-hardness of Kemeny op- 
timization is that the discrete Hodge decomposition may be easily computed 
via a linear least squares regression. We also investigated the ii-projcction of 
QQ edge flows, showing that this has a dual given by correlation maximization over 

bounded divergence-free flows, and the /i-approximate sparse cyclic ranking, 
• • showing that this has a dual given by correlation maximization over bounded 

^ curl-free flows. We discuss connections with well-known ordinal ranking tech- 

J^J"! niques such as Kemeny optimization and Borda count from social choice theory. 



1. Introduction 

The problem of ranking in various contexts has become increasingly important 
in machine learning. Many datasets require some form of ranking to facilitate iden- 
tification of important entries, extraction of principal attributes, and to perform 
efficient search and sort operations. Modern internet and e-commerce applications 
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have spurred an enormous growth in such datascts: Google's search engine, Cite- 
Seer's citation database, cBay's feedback-reputation mechanism, Netflix's movie 
recommendation system, all accumulate a large volume of data that needs to be 
ranked. 

These modern datasets typically have one or more of the following features that 
render traditional ranking methods (such as those in social choice theory) inappli- 
cable or ineffective: (1) unlike traditional ranking problems such as votings and 
tournaments, the data often contains cardinal scores instead of ordinal orderings; 
(2) the given data is largely incomplete with most entries missing a substantial 
amount of information; (3) the data will almost always be imhalanced where the 
amount of available information varies widely from entry to entry and/or from cri- 
terion to criterion; (4) the given data often lives on a large complex network, either 
explicitly or implicitly, and the structure of this underlying network is itself impor- 
tant in the ranking process. These new features have posed new challenges and call 
for new techniques. In this paper we will look at a method that addresses them to 
some extent. 

A fundamental problem here is to globally rank a set of alternatives based on 
scores given by voters. Here the words 'alternatives' and 'voters' are used in a 
generalized sense that depends on the context. For example, the alternatives may 
be websites indexed by Google, scholarly articles indexed by CiteSeer, sellers on 
eBay, or movies on Netflix; the voters in the corresponding contexts may be other 
websites, other scholarly articles, buyers, or viewers. The 'voters' could also refer 
to groups of voters: e.g. websites, articles, buyers, or viewers grouped respectively 
by topics, authorship, buying patterns, or movie tastes. The 'voters' could even 
refer to something entirely abstract, such as a collection of different criteria used 
to judge the alternatives. 

The features (l)-(4) can be observed in the aforementioned examples. In the 
eBay /Netflix context, a buyer/ viewer would assign cardinal scores (1 through 5 
stars) to sellers/movies instead of ranking them in an ordinal fashion; the eBay/Netflix 
datasets are highly incomplete since most buyers/viewers would have rated only a 
very small fraction of the sellers/movies, and also highly imbalanced since a handful 
of popular sellers/blockbuster movies will have received an overwhelming number of 
ratings while the vast majority will get only a moderate or small number of ratings. 
The datasets from Google and CiteSeer have obvious underlying network structures 
given by hyperlinks and citations respectively. Somewhat less obvious are the net- 
work structures underlying the datasets from eBay and Netflix, which come from 
aggregating the pairwise comparisons of buyers/movies over all sellers /viewers. In- 
deed, we shall see that in all these ranking problems, graph structures naturally 
arise from pairwise comparisons, irrespective of whether there is an obvious under- 
lying network (e.g. from citation, friendship, or hyperlink relations) or not, and this 
serves to place ranking problems of seemingly different nature on an equal graph- 
theoretic footing. The incompleteness and imbalance of the datasets could then 
be manifested as the (edge) sparsity structure and (vertex) degree distribution of 
pairwise comparison graphs. 

In collaborative filtering applications, one often encounters a personalized rank- 
ing problem, when one needs to find a global ranking of alternatives that generates 
the most consensus within a group of voters who share similar interests/tastes. 
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While the statistical ranking problem investigated in this paper plays a fundamen- 
tal role in such personalized ranking problems, there is also the equally important 
problem of clustering voters into interest groups, which our methods do not ad- 
dress. We would like to stress that in this paper we only concern ourselves with 
the ranking problem but not the clustering problem. So while we have made use of 
the Netflix prize dataset to motivate our studies, our paper should not be viewed 
as an attempt to solve the Netflix prize problem. 

The method that we will use to analyze pairwise rankings, which we represent as 
edge flows on a graph, comes from discrete or combinatorial Hodge theory. Among 
other things, combinatorial Hodge theory provides us with a mean to determine 
a global ranking that also comes with a 'certificate of reliability' for the validity 
of this global ranking. While Hodge theory is well-known to pure mathematicians 
as a corner stone of geometry and topology, and to applied mathematician as an 
important tool in computational electromagnetics and fluid dynamics, its applica- 
tion to statistical ranking problems has, to the best of our knowledge, never been 
studiecQ 

In all our proposed methods, the graph in question has as its vertices the alter- 
natives to be ranked, voters' preferences are then quantified and aggregated (we 
will say how later) into an edge flow on this graph. Hodge theory then yields 
an orthogonal decomposition of the edge flow into three components: a gradient 
flow that is globally acyclic, a harmonic flow that is locally acyclic but globally 
cyclic, and a curl flow that is locally cyclic. This decomposition is known as the 
Hodge decomposition. The usefulness of the decomposition lies in the fact that the 
gradient flow component induces a global ranking of the alternatives. Unlike the 
computationally intractable Kemeny optimal, this may be easily computed via a 
linear least squares problem. Furthermore, the Z2-norm of the least squares residual, 
which represents the contribution from the sum of the remaining curl flow and har- 
monic flow components, quantifies the validity of the global ranking induced by the 
gradient fiow component. If the residual is small, then the gradient flow accounts 
for most of the variation in the underlying data and therefore the global ranking 
obtained from it is expected to be a majority consensus. On the other hand, if 
the residual is large, then the underlying data is plagued with cyclic inconsistencies 
(i.e. intransitive preference relations of the form a'rzi>'(Zcy---}zz'(ZO,) and one 
may not assign any reasonable global ranking to it. 

We would like to point out here that cyclic inconsistencies are not necessarily 
due to error or noise in the data but may very well be an inherent characteristic 
of the data. As the famous impossibility theorems from social choice theory [2, 3S] 
have shown, inconsistency (or, rather, intransitivity) is inevitable in any societal 
preference aggregation that is sophisticated enough. Social scientists have, through 
empirical studies, observed that preference judgement of groups or individuals on a 
list of alternatives do in fact exhibit such irrational or inconsistent behavior. Indeed 
in any group decision making process, a lack of consensus is the norm rather than 
the exception in our everyday experience. This is the well-known Condorcet paradox 
|10j : the majority prefers a to 6 and b to c, but may yet prefer c to a. Even a single 
individual making his own preference judgements could face such dilemma — if he 
uses multiple criteria to rank the alternatives. As such, the cyclic inconsistencies is 
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intrinsic to any real world ranking data and should be thoroughly analyzed. Hodge 
theory again provides a mean to do so. The curl flow and harmonic flow components 
of an edge flow quantify respectively the local and global cyclic inconsistencies. 

Loosely speaking, a dominant curl flow component suggests that the inconsisten- 
cies are of a local nature while a dominant harmonic flow component suggests that 
they are of a global nature. If most of the inconsistencies come from the curl (local) 
component while the harmonic (global) component is small, then this roughly trans- 
lates to mean that the ordering of closely ranked alternatives is unreliable but that 
of very differently ranked alternatives is reliable, i.e. we cannot say with confidence 
whether the ordering of the 27th, 28th, 29th ranked items makes sense but we can 
say with confidence that the 4th, 60th, 100th items should be ordered according 
to their rank. In other words, Condorcet paradox may well apply to items ranked 
closed together but not to items ranked far apart. For example, if a large number 
of gourmets (voters) are asked to state their preferences on an extensive range of 
food items (alternatives), there may not be a consensus for their preferences with 
regard to hamburgers, hot dogs, and pizzas and there may not be a consensus for 
their preferences with regard to caviar, foie gras, and truffles; but there may well 
be a near universal preference for the latter group of food items over the former 
group. In this case, the inconsistencies will be mostly local and we should expect 
a large curl flow component. If in addition the harmonic flow component is small, 
then most of the inconsistencies happen locally and we could interpret this to mean 
that the global ranking is valid on a coarse scale (ranking different groups of food) 
but not on a flne scale (ranking similar food items belonging to a particular group) . 
We refer the reader to Section [ST] for an explicit example based on the Netflix prize 
dataset. 

When studied in conjunction with robust regression and compressed sensing, the 
three orthogonal subspaces given by Hodge decomposition provide other insights. In 
this paper we will see two results involving /i-optimizations where these subspaces 
provide meaningful and useful interpretations in the primal-dual way: (a) the li- 
projection of an edge flow onto the subspace of gradient flows has a dual problem 
as the maximal correlation over bounded cyclic flows, i.e. the sum of curl flows and 
harmonic flows; (b) the li -approximation of a sparse cyclic flow, has a dual problem 
as the maximal correlation over bounded locally acyclic flows. These results indicate 
that the three orthogonal subspaces could arise even in settings where orthogonality 
is lost. 

1.1. What's New. The main contribution of this paper is in the application of 
Hodge decomposition to the analysis of ranking data. We show that this approach 
has several attractive features: (i) it generalizes the classical Borda Count method 
in voting theory to data that may have missing values; (ii) it provides a way to 
analyze inherent inconsistencies or conflicts in the ranking data; (iii) it is flexible 
enough to be combined with other techniques: these include other ways to form 
pairwise rankings reflecting prior knowledge and the use of li minimization in place 
of I2 minimization to encourage robustness or sparsity. Although relatively straight- 
forward and completely natural, the li aspects of Hodge theory in Section [6] has, 
to the best of our knowledge, never been discussed before. 

We emphasize two conceptual aspects underlying this work that are particularly 
unconventional: (1) We believe that obtaining a global ranking, which is the main 
if not the sole objective of all existing work on rank aggregation, gives only an 
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incomplete picture of the ranking data — one also needs a 'certificate of reliabil- 
ity' for the global ranking. Our method provides this certificate by measuring also 
the local and global inconsistent components of the ranking data. (2) We believe 
that with the right mathematical model, rank aggregation need not be a computa- 
tionally intractable task. The model that we proposed in this paper reduces rank 
aggregation to a linear least squares regression, avoiding usual NP-hard combina- 
torial optimization problems such as finding Kemeny optima or minimum feedback 
arc sets. 

Hodge and Helmholtz decompositions are of course well-known in mathematics 
and physics, but usually in a continuous setting where the underlying spaces have 
the structure of a Riemannian manifold or an algebraic variety. The combinatorial 
Hodge theory that we presented here is arguably a trivial case with the simplest 
possible underlying space — a graph. Many of the difficulties in developing Hodge 
theory in differential and algebraic geometry simply do not surface in our case. 
However this also makes combinatorial Hodge theory accessible — the way we 
developed and presented it essentially requires nothing more than some elementary 
matrix theory and multivariate calculus. We are unaware of similar treatments 
in the existing literature and would consider our elementary treatment a minor 
expository contribution that might help popularize the use of Hodge decomposition 
and the graph Helmholtzian, possibly to other areas in data analysis and machine 
learning. 

1.2. Organization of this Paper. In Section [2] we introduce the main problem 
and discuss how a pairwise comparison graph may be constructed from data com- 
prising cardinal scores by voters on alternatives and how a simple least squares 
regression may be used to compute the desired solution. We define the combinato- 
rial curl, a measure of local (triangular) inconsistency for such data, and also the 
combinatorial gradient and combinatorial divergence. Section [3] presents a purely 
matrix-theoretic view of Hodge theory, but at the expense of some geometric in- 
sights. These are covered when we formally introduce Hodge theory in Section [4] 
We first remind the reader how one may construct a d-dimensional simplicial com- 
plex from any given graph (the pairwise comparison graph in our case) by simply 
fiUing-in all its fc-cliques for k < d. Then we will introduce combinatorial Hodge 
theory for a general c?-dimensional simplicial complex but focusing on the d — 2 
case and its relevance to the ranking problem. In Section [5] we discuss the impli- 
cations of Hodge decomposition applied to ranking, with a deeper analysis on the 
least squares method in Section [2] Section [6] extends the analysis to two closely re- 
lated Zi-minimization problems, the Zi-projection of pairwise ranking onto gradient 
flows and the Zi-approximate sparse cyclic ranking. A discussion of the connections 
with Kemeny optimization and Borda count in social choice theory can be found 
in Section [7] Numerical experiments on three real datasets are given in Section [8] 
to illustrate some basic ideas in this paper. 

1.3. Notations. Let ^ be a finite set. We will adopt the following notation from 
combinatorics : 

fV\ 

1^1 := set of all fc-element subset of V. 

In particular (^) would be the set of all unordered pairs of elements of V and (^) 
would be the set of all unordered triples of elements of V (the sets of ordered pairs 
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and ordered triples will be denoted V x V and V x V x V a.s usual). We will not 
distinguish between V and ( ^ ) . Ordered and unordered pairs will be delimited by 
parentheses («, j) and braces respectively, and likewise for triples and n-tuples 
in general. 

We will use positive integers to label alternatives and voters. Henceforth, V 
will always be the set {1, . . . , n} and will denote a set of alternatives to be ranked. 
In our approach to statistical ranking, these alternatives would be represented as 
vertices of a graph. A = {1, . . . , m} will denote a set of voters. For i,j g V, we 
write i ^ J to mean that alternative i is preferred over alternative j. If we wish 
to emphasize the preference judgement of a particular voter a € A, we will write 
i ha j- 

Since our approach mandates that we borrow terminologies from graph theory, 
vector calculus, linear algebra, algebraic topology, as well as various ranking theo- 
retic terms, we think that it would help to summarize some of the correspondence 
here. 



Graph theory 


Linear algebra 


Vec. calculus 


Topology 


Ranking 


Function on 


Vector in 


Potential 


0-cochain 


Score 


vertices 




function 




function 


Edge flow 


Skew-symmetric 
matrix in M"' ^ 


Vector field 


1-cochain 


Pairwisc 
ranking 


Triangular flow 


Skew-symmetric hyper- 
-matrix in j^*^^"^^"- 


Tensor field 


2-cochain 


Triplewisc 
ranking 



As the reader will see, the notions of gradient, divergence, curl, Laplace operator, 
and Helmholtz operator from vector calculus and topology will play important roles 
in statistical ranking. One novelty of our approach lies in extending these notions 
to the other three columns, where most of them have no well-known equivalent. For 
example, what we will call a harmonic ranking is central to the question of whether 
a global ranking is feasible. This notion is completely natural from the vector 
calculus or topology point-of-view, they correspond to solutions of the Helmholtz 
equation or homology classes. However, it will be hard to define harmonic ranking 
directly in social choice theory without this insight, and we suspect that it is the 
reason why the notion of harmonic ranking has never been discussed in existing 
studies of ranking in social choice theory and other fields. 

2. Statistical Ranking on Graphs 

The main problem discussed in this paper is that of determining a global ranking 
from a dataset comprising a set of alternatives ranked by a set of voters. This is 
a problem that has received attention in fields including decision science [35l [36] , 
financial economics [HHH], machine learning [5J [T^l [T71 HU] , social choice 
statistics [HI HSldSl 12211301 1311 132], among others. Our objective towards statistical 
ranking is two-fold: like everybody else, we want to deduce a global ranking from 
the data whenever possible; but in addition to that, we also want to detect when 
the data does not permit a statistically meaningful global ranking and in which 
case characterize the data in terms of its local and global inconsistencies. 

Let V = {1, . . . ,n} he the set of alternatives to be ranked and A = {1, . . . , to} 
be a set of voters. The implicit assumption is that each voter would have rated, 
i.e. assigned cardinal scores or given an ordinal ordering to, a small fraction of the 
alternatives. But no matter how incomplete the rated portion is, one may always 
convert such ratings into pairwise rankings that has no missing values as follows. 
For each voter a S A, the pairwise ranking matrix of a is a skew-symmetric matrix 
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ya g jjixn^ ■ j^j, ^^^^^ Ordered pair eV xV, we have 

•ya _ ya 

ij ji ■ 

Informally, Y^" measures the 'degree of preference' of the ith alternative over the 
jth alternative held by the ath voter. Studies of ranking problems in different dis- 
ciplines have led to rather different ways of quantifying such 'degree of preference'. 
In Section 2.2.1 we will see several ways of defining Y^" (as score difference, score 
ratio, and score ordering) coming from decision science, machine learning, social 
choice theory, and statistics. If the voter a did not compare alternatives i and j, 
then Yf'j is considered a missing value and set to be for convenience; this manner 
of handling missing values allows Y" to be a skew-symmetric matrix for each a e A. 
Nevertheless we could have assigned any arbitrary value or a non-numerical symbol 
to represent missing values, and this would have not affected our algorithmic results 
because of our use of the following weight function. 

Define the weight function w : A x V x V [0, oo) as the indicator function 



w{a,i,j) 



1 if a made a pairwise comparison for 
otherwise. 



Therefore wfj = iff Y^" is a missing value. Note that W" = [w°j] is a symmetric 
{0, 1}- valued matrix; but more generally, wfj may be chosen as the capacity (in the 
graph theoretic sense) if there are multiple compaxisons of i and j by voter a. The 
pairs (?,j) for which w{a,i,j) = 1 for some a € A are known as crucial pairs in 
the machine learning literature (we thank the reviewers for pointing this out). 

Our general paradigm for statistical ranking is to minimize a weighted sum of 
pairwise loss of a global ranking on the given data over a model class A4 of all 
global rankings. We begin with a simple sum-of-squares loss function, 

(1) ™^ E - .<jiXr,-Y,'^f, 

where the model class A^g is a subset of the skew-symmetric matrices, 

(2) Mg^{X e M"''" I X,j = Sj -Si^s-.V-^ R}. 

Any X Cz A4g induces a global ranking on the alternatives 1, . . . ,n via the rule 
i ^ j iff Sj > Sj. Note that ties, i.e. i ^ j and j ^ i, are allowed and this happens 
precisely when Si = Sj. 

For ranking data given in terms of cardinal scores, this simple scheme preserves 
the magnitudes of the ratings, instead of merely the ordering, when we have globally 



consistent data (see Definition 2.3). Moreover, it may also be computed more easily 
than many other loss functions (though the computational cost depends also on the 
choice of A^). This simple scheme is not as restrictive as it first seems. For example, 
Kemeny optimization in classical social choice theory may be realized as a special 
case where Y^" g {±1} and M is the Kemeny model class, 

(3) Mk {X e M"^" I X,, = sign(sj - s,),s : V ^ R}. 

The function sign : R — s- {±1} takes nonnegative numbers to 1 and negative num- 
bers to —1. A binary valued Y^" is the standard scenario in binary pairwise com- 
parisons jTl [21 [13l |20l [26]; in this context, a global ranking is usually taken to be 
synonymous as a Kemeny optimal. We will discuss Kemeny optimization in greater 
details in Section [3 
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2.1. Pairwise Comparison Graphs and Pairwise Ranking Flows. A graph 
structure arises naturally from ranking data as follows. Let G = (V,E) be an 
undirected graph whose vertex set is V, the set of alternatives to be ranked, and 
whose edge set is 

(4) ^^ = {{^,j}e(D|E„^S>0}, 

i.e. the set of pairs where pairwise comparisons have been made. We call such 
G a pairwise comparison graph. One can further associate weights on the edges as 
capacity, e.g. Wij = J^a 



w. 



A pairwise ranking can be viewed as edge flows on G, i.e. a function X : V xV —> 
M that satisfies 

X{i,j) = -XU,z) if{i,j}&E, 

(5) X{i,j)=0 otherwise. 

It is clear that a skew-symmetric matrix [Xij] induces an edge flow and vice versa. 
So henceforth we will not distinguish between edge flows and skew-symmetric ma- 
trices and will often write Xij in place of X(i,j). 

We will now borrow some terminologies from vector calculus. An edge flow 
of the form Xij — Sj — s^, i.e. X € Mgi can be regarded as the gradient of a 
function s : ^ K, which will be called a potential function (or negative potential, 
depending on sign convention). In the context of ranking, a potential function 
is a score function or utility function on the set of alternatives, assigning a score 
s(i) — Si to alternative i. Note that any such function defines a global ranking as 
discussed after Q. To be precise, we define gradient as follows. 

Definition 2.1. The combinatorial gradient operator maps a potential function 
on the vertices s : V R to an edge flow grad s : V x V ^ M. via 

(6) (grads)(?;, j) = - Si. 

An edge flow that has this form will be called a gradient flow. 

In other words, the combinatorial gradient takes global rankings to pairwise 
rankings. Pairwise rankings that arise in this manner will be called globally con- 
sistent (formally defined in Definition 2.3 1. Given a globally consistent pairwise 
ranking X, we can easily solve grad(s) = X to determine a score function s (up to 
an additive constant), and from s we can obtain a global ranking of the alternatives 
in the manner described after ([2]). Observe that the set of all globally consistent 
pairwise rankings in ^ may be written as Aic — {grads | s : ^ M} = im(grad). 

For convenience, we will drop the adjective 'combinatorial' from 'combinatorial 
gradient'. We may sometimes also drop the adjective 'pairwise' in 'globally consis- 
tent pairwise ranking' when there is no risk of confusion. 

The optimization problem ([ij can be rewritten in the form of a weighted I2- 
minimization on a pairwise comparison graph 

(7) min \\X-Y\\j.^= min [V w^JXa-Y,,) 
where 

(8) ^^r■^Y.o.<, and 

An optimizer thus corresponds to an Z2-projection of a pairwise ranking edge fiow Y 
onto the space of gradient flows. We note that W — \wij\ = is a symmetric 
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nonnegative- valued matrix. This choice of W is not intended to be rigid. One could 
for example define W to incorporate prior knowledge of the relative importance of 
the paired comparisons as judged by the voters. 

An interesting variation of this scheme is an analogous Zi-projection onto the 
space of gradient flows, 



^^Wij\Xij Yij\ 



(9) min ||X-y||i,„= min |"V 

Its solutions are more robust to outliers or large deviations in Yij as ^ may be 
regarded as the least absolute deviation (LAD) method in robust regression. We 



will discuss this problem in greater details in Section 6.1 

Combinatorial Hodge theory will provide a geometric interpretation of the opti- 
mizer and residuals of ([T]) as well as further insights on ([9]). Before going deeper 
into the analysis of such optimization problems, we present several examples of 
pairwise ranking arising from applications. 



2.2. Pairwise Rankings. Humans are unable to make accurate preference judge- 
ment on even moderately large sets. In fact, it has been argued that most people 
can rank only between 5 to 9 alternatives at a time j37j . This is probably why 
many rating scales (e.g. the ones used by Amazon, eBay, Netflix, YouTube) are all 
based on a 5-star scale. Hence one expects large human-generated ranking data to 
be at best partially ordered (with chains of lengths about 5 to 9, if ^ is accurate). 
For most people, it is a harder task to rank or rate 20 movies than to compare the 
movies a pair at a time. In certain settings such as tennis tournaments and wine 
tasting, only pairwise comparisons are possible. Pairwise comparison methods, 
which involve the smallest partial rankings, is thus natural for analyzing ranking 
data. 

Pairwise comparisons also help reduce bias due to the arbitrariness of rating 



scale by adopting a relative measure. As we will see in Section 2.2.1 pairwise 



comparisons provide a way to handle missing values, which are expected because 
of the general lack of incentives or patience for a human to process a large dataset. 
For these reasons, pairwise comparison methods have been popular in psychology, 
statistics, and social choice theory [331 [THl [351 [2] . Such methods are also getting 
increasing attention from the machine learning community as they may be adapted 
for studying classification problems [T9l [T7\ 120] . We will present two very different 
instances where pairwise rankings arise: recommendation systems and exchange 
economic systems. 



2.2.1. Recommendation systems. The generic scenario in recommendation systems 
is that there are m voters rating n alternatives. For example, in the Netflix context, 
viewers will rate a movie on a scale of 5 stars [5]; in financial markets, analysts will 
rate a stock or a security by 5 classes of recommendations [4]. In these cases, we let 
A — [oai] G represent the voter-alternative matrix. A typically has a large 

number of missing values; for example, the dataset that Netflix released for its prize 
competition contains a viewer-movie matrix with 99% of its values missing. The 
standard problem here is to predict these missing values from the given data but 
we caution the reader again that this is not the problem addressed in our paper. 
Instead of estimating the missing values of A, we want to learn a global ranking of 
the alternatives from A, without having to first estimate the missing values. 
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Even though the matrix A may be highly incomplete, we may aggregate over all 
voters to get a pairwise ranking matrix using one of the four following methods. 

(1) Arithmetic mean of score differences: The score difference refers to 
y^" — Qaj — ttai- The arithmetic mean over all customers who have rated 
both i and j is 



#{a I aaiyttaj exist} 
This is translation invariant. 

(2) Geometric mean of score ratios: Assuming A > 0. The score ratio 
refers to Y^" = Uaj/aai. The (log) geometric mean over all customers who 
have rated both i and j is 

Y ^ Ea(log«aj -logggj) 
#{q; I a,ai,o,aj exist} 

This is scale invariant. 

(3) Binary comparison: Here Y^" — sign(aQ,j- — Qai). Its average is the 
probability difference that the alternative j is preferred to i than the other 
way round, 

Yij = Pr{a I aaj > aai} - Pr{a | aaj < Uai}. 

This is invariant up to a monotone transformation. 

(4) Logarithmic odds ratio: As in the case of binary comparison, except 
that we adopt a logarithmic scale 



Prja I Ugj > agj} 



= log : 



This is also invariant up to a monotone transformation. 

Each of these four statistics is a form of "average pairwise ranking" over all 
voters. The first model leads to the concept of position-rules in social choice theory 
[51] and it has also been used in machine learning recently [T^. The second model 
has appeared in multi-criteria decision theory [35 . The third and fourth models 
are known as linear model [32 and Bradley-Terry model [B] respectively in the 
statistics and psychology literature. There are other plausible choices for defining 
Yij, e.g. |42l [29., 3Q, .31 , but we will not discuss more of them here. It suffices 
to note that there is a rich variety of techniques to preprocess raw ranking data 
into the pairwise ranking edge flow Yij that serves as input to our Hodge theoretic 
method. However, it should be noted that the I2- and Zi-optimization on graphs 
m and ^ may be applied with any of the four choices above since only the 
knowledge of Yij is required but the sum-of-squares and Kemeny optimization in 
([ij and ([3]) require the original score difference or score order data be known for 
each voter. 

2.2.2. Exchange economic systems. A purely exchange economic system may be 
described by a graph G — {V, E) with vertex set y = {1, . . . , n] representing the 
n goods and edge set E C representing feasible pairwise transactions. If the 
market is complete in the sense that every pair of goods is exchangeable, then G 
is a complete graph. Suppose the exchange rate between the ith and jth goods is 
given by 

1 unit i = fly unit j, a,ij > 0. 
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Then the exchange rate matrix A — [aij] is a reciprocal matrix (possibly with 
missing values), i.e. a^- = l/uji for all i,j G V. The reciprocal matrix was first 
used in the studies of paired preference aggregation by Saaty [35' ; it was also used 
by Ma |2_8] to study currency exchange markets. A pricing problem here is to look 
for a universal equivalent which measures the values of goods (this is in fact an 
abstraction of the concept of money), i.e. tt : y — > M such that 

In complete markets where G is a complete graph, there exists a universal equivalent 
if and only if the market is triangular arbitrage-free, i.e. aijUjk = Uik for all distinct 
k £ V; since in this case the transaction path i j ^ k provides no gain nor 
loss over a direct exchange i ^ k. 

Such purely exchange economic system is equivalent to pairwise ranking via the 
logarithmic map, 

Xij = log Qij. 

The triangular arbitrage-free condition is then equivalent to the transitivity condi- 



tion in ( 11 ), i.e. 

Xij + Xjk + Xki — 0. 

So asking if a universal equivalent exists is the same as asking if a global ranking 
s : — > M exists so that Xij — Sj — Si with Si — log tt^ . 

2.3. Measure of Triangular Inconsistency: combinatorial curl. Upon con- 
structing pairwise rankings from the raw data, we need a statistics to quantify the 
inconsistency in the pairwise rankings. Again we will borrow a terminology from 
vector calculus and define a notion of combinatorial curl as a measure of triangular 
inconsistency. 

Given a pairwise ranking represented as an edge flow X on a graph G = {V, E), 
we expect the following 'consistency' property: following a loop i —s- j —s- i 
where each edge is in E, the amount of the scores raised should be equal to the 
amount of the scores lowered; so after a loop of comparisons we should return to 
the same score on the same alternative. Since the simplest loop is a triangular loop 
i —f j —f k i, the 'basic unit' of inconsistency should be triangular in nature and 
this leads us to the combinatorial curl in Definition 12.21 

We will first define a notion analogous to edge flows. The triangular flow on G 
is a function VxVxV^R that satisfles 

$(i,j,fc) = ^{j,k,i) = $(fc,i,j) = -^{j,i,k) = -$(i,fc,j) = -$(fc,j,z), 

i.e. an odd permutation of the arguments of <i> changes its sign while an even 
permutation preserves its sigij^ A triangular flow describes triplewise rankings in 
the same way an edge flow describes pairwise rankings. 

Definition 2.2. Let X be an edge flow on a graph G = {V, E). Let 

T{E) J, k} e Q I {z, j}, {j, fc}, {fc, 1} e E} 

■^A triangular flow is an alternating 3-tensor and may be represented as a skew-symmetric 
hypermatrix S much like an edge flow is an alternating 2-tensor and may be 

represented by a skew-symmetric matrix [Xij] 6 M"^". We will often write ^ijk in place of 
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be the collection of triangles with every edge in E. We define the combinatorial 
curl operator that maps edge flows to triangular flows by 



(10) icmlX){i,j,k) 



+ Xjk + Xfc, if{i,j,k}eT{E), 
otherwise. 



In other words, the combinatorial curl takes pairwise rankings to triplewise rank- 
ings. Again, we will drop the adjective 'combinatorial' when there is no risk of 
confusion. The skew-symmetry of X, i.e. Xij = —Xji, guarantees that curlX is a 
triangular flow, i.e. 

(curlX)(i, j, fc) = {curl X){j,k,i) = (curlX)(fc,z, j) 

= -{cmlX){j,i,k) = -{curl X){i,k,j) = -{curlX){k,j,i). 

The curl of a pairwise ranking measures its triangular inconsistency. This ex- 
tends the consistency index of Kendall and Smith [26\ . which counts the number 
of circular triads, from ordinal settings to cardinal settings. Note that for binary 
pairwise ranking where Xij g {±1}, the absolute value |(curlX)(z, j, fc)| may only 
take two values, 1 or 3. The triangle {i,j,k} S T{E) contains a cyclic ranking or 
circular triad if and only if \{curl X){i,j,k)\ = 3. If G is a complete graph, the 
number of circular triads has been shown 126 to be 



N=I^{n'-l)-'-y \ 



For ranking data given in terms of cardinal scores and that is generally incom- 
plete, curl plays an extended role in addition to just quantifying the triangular 
inconsistency. We now formally define some ranking theoretic notions in terms of 
the combinatorial gradient and combinatorial curl. 

Definition 2.3. Let X : V x V R be a pairwise ranking edge flow on a pairwise 
comparison graph G — {V, E) . 

(1) X is called consistent on /c} e T{E) if it is curl-free on {i,j,k}, i.e. 

{curl X){i,j, k) = X,j + Xjk + Xk^ = 0. 

Note that this implies that curl{X){(7{i), a{j), (^{k)) — for every permuta- 
tion (7. 

(2) X is called globally consistent if it is a gradient flow of a score function, 
i.e. 

X = grad s for some s : — > M. 

(3) X is called locally consistent or triangularly consistent if it is curl-free 
on every triangle in T{E), i.e. every 3-clique of G. 

Clearly any gradient flow must be curl-free everywhere, i.e. the well-known iden- 
tity in vector calculus 

curl o grad = 

is also true for combinatorial curl and combinatorial gradient (a special case of 



Lemma 4.4). So global consistency implies local consistency. A qualifled converse 



may be deduced from the Hodge decomposition theorem (see also Theorem 5.2 1: a 
curl-free flow on a complete graph must necessarily be a gradient flow, or putting 
it another way, a locally consistent pairwise ranking must necessarily be a globally 
consistent pairwise ranking when there are no missing values, i.e. if the pairwise 
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Figure 1. A harmonic pairwise ranking, which is locally consis- 
tent on every triangles but inconsistent along the loop A ^ B ^ 



comparison graph is a complete graph (every pair of alternatives has been com- 
pared) . 

When G is an incomplete graph, the condition that X is curl-free on every 
triangle in the graph will not be enough to guarantee that it is a gradient flow. 
The reason lies in that curl only takes into account the triangular inconsistency; 
but since there are missing edges in the pairwise comparison graph G, it is possible 
that non-triangular cyclic rankings of lengths greater than three can occur. For 
example. Figure [l] shows a pairwise ranking that is locally consistent on every 
triangle but globally inconsistent, since it contains a cyclic ranking of length six. 
Fortunately, Hodge decomposition theorem will tell us that all such cyclic rankings 
lie in a subspace of harmonic rankings, which can be characterized as the kernel of 
some combinatorial Laplacians. 

3. A Matrix Theoretic View of Hodge Decomposition 

We will see in this section that edge flows, gradient flows, harmonic flows, and 
curl flows can all be represented as specially structured skew-symmetric matrices. 
In this framework, the Hodge decomposition theorem may be viewed as an or- 
thogonal direct sum decomposition of the space of skew-symmetric matrices into 
three subspaces. A formal treatment of combinatorial Hodge theory will be given 
in Section m 

Recall that a matrix X e M"^" is said to be skew -symmetric if Xij = —Xji for 
all i,j € V :— {1, . . . , n}. One knows from linear algebra that any square matrix A 
may be written uniquely as a sum of a symmetric and a skew-symmetric matrix, 

A^ ^{A + A^) + l{A- A^). 

We will denot£0 

A:={X e M"''" I X^ = -X}, and S := {X e R"""" \ X^ ^ X}. 

It is perhaps interesting to note that semidefinite programming takes place in the 
cone of symmetric positive deflnite matrices in S but the optimization problems in 
this paper take place in the exterior space A. 



■^More common notations for A are 0On(K) (Lie algebra of SO(n)) and A^(IR"') (second exterior 
product of R") but we avoided these since we use almost no Lie theory and exterior algebra. 
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One simple way to construct a skew-symmetric matrix is to take a vector s = 
[si, . . . , s„]^ e M" and define X by 

X^ij . — Si Sj. 

Note that ii X ^ 0, then rank(X) = 2 since it can be expressed as se^ — es^ with 
e :— [1, . . . , 1]^ e M". These are in a sense the simplest type of skew-symmetric 
matrices — they have the lowest rank possible for a non-zero skew-symmetric matrix 
(recall that the rank of a skew-symmetric matrix is necessarily even). In this paper, 
we will call these gradient matrices and denote them collectively by A4q, 

Mg --^ {X e A \ X^j = s., - Sj for some s £ M"}. 

For T C (^) , we define the set of T-consistent matrices as 

(11) Mt---{X eA\ X,j + Xjk + Xk^ = Q for all {i, j, k} e T}. 

We can immediately observe every X G Mg is T-consistent for any T C i.e. 
■Mg C A4t- Conversely, a matrix X that satisfies 

Xij -I- Xjk + Xki — for every triple {z, j, /c} e 

is necessarily a gradient matrix, i.e. 

(12) MG^M^vy 

Given T C (^), it is straightforward to verify that both M.g and Mt are 
subspaces of E"^". The preceding discussions then imply the following subspace 
relations: 

(13) Mg<^Mt^A. 

Since these are strict inclusions in general, several complementary subspaces arise 
naturally. With respect to the usual inner product {X, Y) = tr(X^F) = ^ - ■ XijYij 
we obtain orthogonal complements of A^g and Aly in as well as the orthogonal 
complement of AIg in AIt, which we denote by A^/f : 

A = Mg®Mg, A = Mt®Mt, Mt ^ Mg®Mh- 

We will call the elements of A4h harmonic matrices as we shall see that they are 
discrete analogues of solutions to the Laplace equation (or, more accurately, the 
Helmholtz equation). An alternative characterization of Mh is 

Mh = Mt n AIg' 

which may be viewed as a discrete analogue of the condition of being simul- 
taneously curl-free and divergence-free. More generally, this discussion applies 
to any weighted inner product (X, y)^ = jWijXijYy. The five subspaces 

A^GjA^TiA^HjA^tjA^g of-^ play a central role in our techniques. As we shall see 
later, the Helmholtz decomposition in Theorem |4.8| may be viewed as the orthogonal 
direct sum decomposition 

A = Mg®Mh®Mt- 
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4. Combinatorial Hodge Theory 

In this section we will give a brief introduction to combinatorial Hodge theory, 
paying special attention to its relevance in statistical ranking. One may wonder why 
we do not rely on our relatively simple matrix view in Section [3] The reasons are 
two fold: firstly, important geometric insights are lost when the actual motivations 
behind the matrix picture are disregarded; and secondly, the matrix approach ap- 
plies only to the case of 2-dimensional simplicial complex but combinatorial Hodge 
theory extends to any fc-dimensional simplicial complex. While so far we did not 
use any simplicial complex of dimension higher than 2 in our study of statistical 
ranking, it is conceivable that higher-dimensional simplicial complex could play a 
role in future studies. 

4.1. Extension of Pairwise Comparison Graph to Simplicial Complex. 

Let G — (V, E) be a pairwise comparison graph. To characterize the triangular 
inconsistency or curl, one needs to study the triangles formed by the 3-cliques[_] i.e. 
the set 

T{E) {{i,3,k} e Q I {i,j},{j,k},{k,i} e E}. 

A combinatorial object of the form (V, E, T) where E C , T C , and 
{i, j}, {j, fc}, {/c, z} e E for all fc} e T is called a 2-dimensional simplicial 
complex. This is a generalization of the notion of a graph, which is a 1-dimensional 
simplicial complex. In particular, given a graph G ~ {V^E), the 2-dimensional 
simplicial complex (V, E,T(E)) is called the 3-clique complex of G. 

More generally, a simplicial complex (V,S) is a vertex set V — {!,..., n} to- 
gether with a collection E of subsets of V that is closed under inclusion, i.e. if r € S 
and cr C T, then cr S S. The elements in S are called simplices. For example, a 
0-simplex is just an element i ^ V (recall that we do not distinguish between (^) 
and V), a 1-simplex is a pair {i,j} G (^), a 2-simplex is a triple {i,j, k} g (g), and 
so on. For k < n, a fc-simplex is a. {k+ l)-element set in (j^i) and C (i^i) will 
denote the set of all fc-simplices in S. In the previous paragraph, Sq — V , T,i ^ E, 
S2 — T, and = V U E U T. In general, given any undirected graph G = {V, E), 
one obtains a (fc — l)-dimensional simplicial complex Kq :— (y,Sfc_i) called the 
k- clique comp/eaj^of G by 'filling in' all its j-cliques for j — 1, . . . , fc, or more pre- 
cisely, by setting E — {j-cliques of G | j — 1, . . . ,k}. The fc-clique complex of G 
where k is maximal is just called the clique complex of G and denoted Kq. 

In this paper, we will mainly concern ourselves with studying the 3-clique com- 
plex Kq — {V, E,T{E)) where G is a pairwise comparison graph. Note that we 
could also look at the simplicial complex {V, E,Tj{E)) where 

T^{E) := {{z, J, k} e T{E) \ \X,, + X^k + Xm\ < 7} 

where < 7 < 00. For 7 = cxi, we get Kq but for general 7 we get a subcomplex 
of Kq. We have found this to be a useful multiscale characterization of the incon- 
sistencies of pairwise rankings but the detailed discussion will have to be left to a 
future paper. 



Recall that a fc-clique of G is just a complete subgraph of G with fc vertices. 
'Note that a fc-clique is a (fc — l)-simplox. 
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4.2. Cochains, Coboundary Maps, and Combinatorial Laplacians. We will 
now introduce some discrete exterior calculus on a simplicial complex where po- 
tential functions (scores or utility), edge flow (pairwise ranking), triangular flow 
(triplewise ranking), gradient (global ranking induced by scores), curl (local incon- 
sistency) become just special cases of a much more general framework. We will now 
also define the notions of combinatorial divergence and combinatorial Laplacians. 
A 0-dimensional combinatorial Laplacian is just the usual graph Laplacian but the 
case of greatest interest to us is the 1-dimensional combinatorial Laplacian, or what 
we will call the graph Helmholtzian. 

Definition 4.1. Let K be a simplicial complex and recall that E^, denotes its set 
of k-simplices. A k-dimensional cochain is a real-valued function on k-tuples of 
vertices that is alternating on each of the k-simplex and otherwise, i.e. f : V'' M. 
such that 

/(V(o), • ■ • , V(fe)) = sign((T)/(zo, . . .,ik), 

for all («o, ■■ - lik) G V'' o,nd alia G &k+i, the permutation group on k + 1 elements, 
and that 

f{io, . . . ,ifc) = if{io, . . . ^ Sfc. 
The set of all k-cochains on K is denoted C''{K,M.). 

For simplicity we will often just write C'^ for C'^{K,M.). In particular, is the 
space of potential functions (score/utility functions), is the space of edge flows 
(pairwise rankings), and is the space of triangular flows (triplewise rankings). 

The fc-cochain space C'^' can be given a choice of inner product. In view of the 
weighted Z2-iiiiiiimization for our statistical ranking problem ([T]) , we will define the 
following inner product on C^, 

(14) {X,Y)^=J2^^^^^^^w,,X,,Y,„ 

for all edge flows X,Y ^ C^. In the context of a pairwise comparison graph G, it 
may not be immediately clear why this defines an inner product since we have noted 
after (|8| that W = [wij] is only a nonnegative matrix and it is possible that some 
entries are 0. However observe that by definition Wij = iff no voters have rated 
both alternatives i and j and therefore ^ E hy ^ and so any edge fiow X 

will automatically have Xij = by ([5|. Hence we indeed have that {X, X)^ = iff 
X — 0, as required for an inner product (the other properties are trivial to check). 

The operators grad and curl are all special instances of coboundary maps as 
defined below. 

Definition 4.2. The kth coboundary operator Sk : C''{K,R) C''+^{K,R) is 
the linear map that takes a k-cochain f e C'^ to a {k + \)-cochain 6kf G C'^'^^ 
defined by 

(4/)(io,«i, ■ • ■,'ik+i) 2^^.^q(-1)V(«o, ■ • ■ ■ ■ ■,ik+i)- 

Note that ij is omitted from jth term in the sum, i.e. coboundary maps compute 
an alternating difference with one input left out. In particular. So = grad, i.e. 
((5os)(i, j) = - Si, and 5i = curl, i.e. {5iX){i,j, k) = Xi^ + Xjk + Xki. 

Given a choice of an inner product {■,-)k on C'', we may define the adjoint 
operator of the coboundary map, SI : C'^^^ in the usual manner, i.e. 

{Skfk,gk+l)k+l = {fk,5k9k+l)k- 
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Definition 4.3. The combinatorial divergence operator div : C^{K,M.) 
C'^(K,M.) is the adjoint of Sq = grad, i.e. 

(15) div:=-(5o. 

Divergence will appear in the minimum norm solution to ([t]) and can be used to 
characterize Mq. As usual, we will drop the adjective 'combinatorial' when there 
is no cause for confusion. 

For statistical ranking, it suffices to consider the cases k = 0,1,2. Let G be a 
pairwise comparison graph and Kq its clique compleji]^ The cochain maps, 

(16) C°{Kg,R) ^ C\Kg,R) ^ C\Kg,R) 
and their adjoint, 

(17) G°(/^G,M) S C\Kg,R) ^ C^{Kg,U), 

have the following ranking theoretic interpretation with C'^,C^,C^ representing 
the spaces of score or utility functions, pairwise rankings, and triplewise rankings 
respectively, 

grad . . curl , ■ , 

scores > pairwise > trip Lewis e , 



— div=grad* . . curl* , . , 

scores < pairwise < triplewise. 

In summary, the formulas for combinatorial gradient, curl, and divergence are given 
by 

(grads)(i,j) = {6f)s){i,j) = - s,, 
(curlX)(i, j, k) ^ {SiX){i,j, k) = + Xjk + Xu, 
(divX)(*) = H51X){{) = 5],. _ 



J s.t. {i,]}eE 



with respect to the inner product {X,Y)w — j}£E '^ij-^ij^ij '-"^ 

As an aside, it is perhaps worth pointing out that there is no special name for 
the adjoint of curl coming from physics because in 3-space, may be identified 
with via a property called Hodge duality and in which case curl is a self-adjoint 
operator, i.e. curl* = curl. This will not be true in our case. 

If we represent functions on vertices by n-vectors, edge flows by n x n skew- 
symmetric matrices, and triangular flows hy n x n x n skew-symmetric hyperma- 
trices, i.e. 

C" = M", 

C = {[X,j] e M"><" I X,j = -X,,} = A, 

then in the language of linear algebra introduced in Section [3] we have the following 
correspondence 

im((5o) — im(grad) = Mgi ker((5i) = ker(curl) = Mt, 

ker((5o) = ker(div) = Mq, \in{5l) ^ im(curr) Mt, 

where T = T{E). 



^It does not matter whether we consider Kq or Kq or indeed any Kq where A: > 3; the 
higher-dimensional fc-simplices where A: > 3 do not play a role in the coboundary maps (5o , <5i , 52 ■ 
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Coboundary maps have the following important property. 
Lemma 4.4 (Closedness). 5k+i o <5fc — 0. 

For k — 0, this and its adjoint are well-known identities in vector calculus, 

(18) curlograd = 0, divocurl* = 0. 

Ranking theoretically, the first identity simply says that a global ranking must be 
consistent. 

We will now define combinatorial Laplacians, higher-dimensional analogues of 
the graph Laplacian. 

Definition 4.5. Let K be a simplicial complex. The k-dimensional combinatorial 
Laplacian is the operator : C'^{K,M.) C''{K,'U.) defined by 

(19) Ak^SloSk + Sk-ioSl_i. 
In particular, for k ~ 0, 

^0 — ^0 ° Sq = div o grad 

is a discrete analogue of the scalar Laplacian or Laplace operator while for k = 1, 

Ai = SI o Si + Sq o Sq = curl* o curl — grad o div 

is a discrete analogue of the vector Laplacian or Helmholtz operator. In the context 
of graph theory, if K = Kq, then Aq is called the graph Laplacian [11 while Ai is 
called the graph Helmholtzian. 

The combinatorial Laplacian has some well-known, important properties. 

Lemma 4.6. A^ is a positive semidefinite operator. Furthermore, the dimension 
o/ker(Afc) is equal to kth Betti number of K . 

We will call a cochain / € ker(Afc) harmonic since they are solutions to higher- 
dimensional analogue of the Laplace equation 

Afc/ = 0. 

Strictly speaking, the Laplace equation refers to Aq/ = 0. The equation AiX — 
is really the Helmholtz equation. But nonetheless, we will still call an edge flow 
X G ker(Ai) a harmonic flow. 

4.3. Hodge Decomposition Theorem. We now state the main theorem in com- 
binatorial Hodge theory. 

Theorem 4.7 (Hodge Decomposition Theorem). C''{K,M.) admits an orthogonal 
decomposition 

C''{K,R) = imiSk-i) ® ker(Afc) ® im((5^). 

Furthermore, 

ker(Afc) = ker(4) n ker((S^_i). 

An elementary proof targeted at a computer science readership may be found in 
[T5] . For completeness we include a proof here. 
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Inconsistent (divergence-free) 

ker(div) 




ker(furl) 

Locally consistent (curl-free) 



Figure 2. Hodge/Helmholtz decomposition of pairwise rankings 



taking adjoint yields (5^_ 



Proof. We will use Lemma 4.4 First, C'^ = im((5/i;_i)®ker((5^._ J. Since SkSk-i — 0, 

1 



i"fe 



0, which implies that im((5^) C ker((5^ 



k-l) 



There- 
fore ker((5^_ J = [ini((5^) ® ker(4)] n ker((5*_ J = [im((5*,) n ker((5^_ J] © [ker(4) n 
ker(5^_-^)] = ini(<5^) ® [ker((5fe) n ker((5^_-^)]. It remains to show that kcr{Sk) n 
ker(^*_i) = ker(Afc) = ker(4-i^2_i + 5fc4). Clearly ker(4)nker(<5^._J C ker(Afc). 
For any X = (5^<f> e ini((5^.) where ^ <I> G C'^+^, Lemma 4.4 again implies 
Sk-iSl_iX = 5/i;_i5j!;._^(5^<i> = 0, but SlSkX — SlSkSl^ ^ 0, which implies that 
AkX ^ 0. Similarly for X e im((5o)- Hence ker(Afe) = ker(4) H ker(J^_ J. □ 

While Hodge decomposition holds in general for any simplicial complex and in 
any dimension k, the case A: = 1 is more often called the Helmholtz decomposition 
theorerr^ We will state it here for the special case of a clique complex. 

Theorem 4.8 (Helmholtz Decomposition Theorem). Let G = {V,E) be an undi- 
rected, unweighted graph and Kg be its clique complex. The space of edge flows on 



G, I.e. G\Kg, 
(20) 

Furthermore, 
(21) 



M), admits an orthogonal decomposition 

C^(is:G,K) im((5o) © ker(Ai) e im(5*) 

= im(grad) ker(Ai) © im(cuii*). 

ker(Ai) — ker((5i) n ker((5Q) — ker(cuii) n ker(div). 



The clique complex Kg above may be substituted with any Kq with fc > 3 (see 
Footnote |6|. The equation (21 1 says that an edge flow is harmonic iff it is both 
curl-free and divergence-free. Figure 4.3 illustrates (20 1. 



To understand the significance of this theorem, we need to discuss the ranking 
theoretic interpretations of each subspace in the theorem. 

(1) im((5o) = im(grad) denotes the subspace of pairwise rankings that are the 
gradient flows of score functions. Thus this subspace comprises the globally 
consistent or acyclic pairwise rankings. Given any pairwise ranking from 
this subspace, we may determine a score function on the alternatives that 



On a simply connected manifold, the continuous version of the Helmholtz decomposition 
theorem is just the fundamental theorem of vector calculus. 
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is unique up to an additive const ant|j and then we may rank all alternatives 
globally in terms of their scores. 

(2) ker((5g) = ker(div) denotes the subspace of divergence-free pairwise rank- 
ings, whose total in-flow equals total out-flow for each alternative i G V. 
Such pairwise rankings may be regarded as cyclic rankings, i.e. rankings 
of the form i cz j k >: ■ ■ • ^ i, and they are clearly inconsistent. Since 
ker(div*) = im(grad)^, cyclic rankings have zero projection on global rank- 
ings. 

(3) ker(5i) — ker(curl) denotes the subspace of curl-free pairwise rankings with 
zero flow-sum along any triangle in Kq. This corresponds to locally con- 
sistent (i.e. triangularly consistent) pairwise rankings. Note that by the 
Closedness Lemma curlograd = and so im(grad) C ker(curl). In gen- 
eral, the globally consistent pairwise rankings induced by gradient flows 
of score functions only account for a subset of locally consistent rankings. 
The remaining ones are the locally consistent rankings that are not globally 
consistent and they are precisely the harmonic rankings discussed below. 

(4) ker(Ai) = ker(curl) H ker(div) denotes the subspace of harmonic pairwise 
rankings, or just harmonic rankings in short. It is the space of solutions 
to the Helmholtz equation. Harmonic rankings are exactly those pairwise 
rankings that are both curl-free and divergence-free. These are only locally 
consistent with zero curl on every triangle in T{E) but are not globally 
consistent. In other words, while there are no inconsistencies due to small 
loops of length 3,i.e.i^j^fc^i, there are inconsistencies along larger 
loops of lengths > 3, i.e. a>h>c>--->z>a. So these are also cyclic 
rankings. Rank aggregation on ker(Ai) depends on the edge paths traversed 
in the simplicial complex; along homotopy equivalent paths one obtains 
consistent rankings. Figure [T| gives an example of harmonic rankings. 

(5) im(5J) = im(curr) denotes the subspace of locally cyclic pairwise rank- 
ings that have non-zero curls along triangles. By the Closedness Lemma, 
im(curl*) C ker(div) and so this subspace is in general a proper subspace 
of the divergence-free rankings; the orthogonal complement of im(curr) 
in ker(div) is precisely the space of harmonic rankings ker(Ai) discussed 
above. 

5. Implications of Hodge Theory 

We now state two immediate implications of the Helmholtz decomposition the- 
orem when applied to statistical ranking. The flrst implication is that it gives an 
interpretation of the solution and residual of the optimization problem ([t]) ; these 
are respectively the /2-pi'ojection on gradient flows and divergence-free flows. In 
the context of statistical ranking and in the Z2-sense, the solution to Q gives the 
nearest globally consistent pairwise ranking to the data while the residual gives the 
sum total of all inconsistent components (both local and harmonic) in the data. 
The second implication is the condition that local consistency guarantees global 
consistency whenever there is no harmonic component in the data (which happens 
iff the clique complex of the pairwise comparison graph is 'loop-free'). 



Note that ker{(5o) = ker(grad) is the set of constant functions on V and so grad(s) = grad(s + 
constant). 
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5.1. Structure Theorem for Global Ranking and the Residual of Incon- 
sistency. In order to cast our optimization problem ([7| in the Hodge theoretic 
framework, we need to specify relevant inner products on C'^,C^,C^. As before, 
the inner product on the space of edge flows (pairwise rankings) will be a 
weighted Euclidean inner product 

for X,Y £ C^. We will let the inner products on C° and be the unweighted 
Euclidean inner product 

for r, s e and 0, <f> G C^. We note that other inner products can be chosen (e.g. 
the inner products on C'^ and could have been weighted) with corresponding 
straightforward modification of ([T]) but this would not change the essential nature of 
our methods. We made the above choices mainly to keep our notations uncluttered. 

The optimization problem Q is then equivalent to an ^2-pi'ojection of an edge 
flow representing a pairwise ranking onto im(grad), 

min \\6qs - Y\\2,w = min || grad s - F||2,«,, 
see" sec« 

The Helmholtz decomposition theorem then leads to the following result about the 
structures of the solutions and residuals of (|7]). In Theorem 5.1 below, we assume 



that the pairwise ranking data Y has been estimated from one of the methods in 
Section |2.2.1[ The least squares solution s will be a score function that induces 
grads, the ^2-nearest global ranking to Y. Since s is only unique up to a constant 
(see Footnote [s]) , we determine a unique minimum norm solution s* for the sake 
of well-posedness; but nevertheless any s will yield the same global ordering of 
alternatives. The least squares residual R* represents the inconsistent component 
of the ranking data Y. The magnitude of R* is a 'certificate of reliability' for s; 
since if this is small, then the globally consistent component grad s accounts for 
most of the variation in Y and we may conclude that s gives a reasonably reliable 
ranking of the alternatives. But even when the magnitude of R* is large, we will see 
that it may be further resolved into a global and a local component that determine 
when a comparison of alternatives with respect to s is still valid. 

Theorem 5.1. (i) Solutions of ^ satisfy the following normal equation 

(22) Aos = -divy, 
and thus the minimum norm solution is 

(23) s* = -Aj)divF 



where f indicates a Moore-Penrose inverse. The divergence in (23 1 is given 
by 

(divr)(z)-^. w,,Y,,, 
and the matrix representing the graph Laplacian is given by 

—Wij if j is such that {i,j} G E, 
otherwise. 



22 



X. JIANG, L.-H. LIM, Y. YAO, AND Y. YE 



(ii) The residual R* = Y — Sqs* is divergence-free, i.e. divi?* = 0. Moreover, it 
has a further orthogonal decomposition 

(24) R* = proji,„(^u,i.) Y + projkor(Ai) Y, 

where projjj^^j.y^i.-) Y is a local cyclic ranking accounting for local inconsisten- 
cies and projijgj.(^j^) Y is a harmonic ranking accounting for global inconsis- 
tencies. In particular, the projections are given by 

(25) projijj,(g^,i.) = curl^curl and projkor(Ai) = - ^^^i 

Proof. The normal equation for the least squares problem minggpo \\doS — Y\\2 ^ is 

S'qSqs = SqY. 



(22 1, (23 1, and divi?* = are obvious upon substituting Aq = S^Sq and div = 
"Sq. The expressions for divergence and graph Laplacian in ^ follow from their 
respective definitions. The Helmholtz decomposition theorem implies 

ker(Ai) ® im(curr) — im(grad)^. 

Obviously projjj^^g^g^j^i grad s* — 0. Since R* —Y — grad s* is a least squares resid- 
ual, we must have projijjjj-gj.j^^j-) R* = proji,jj(gi.a(j) F— grads* = 0. These observations 
yield ([24]), as 

R* = projij^(gj.jj(j-) R* + proji„j(g^jjjj)j_ R* = + proji,cr(Ai)eim(curr) ^■ 



The expression for the projection in (25 1 is standard. □ 

In the special case when the pairwise ranking matrix G is a complete graph and 
we have an unweighted Euclidean inner product on , the minimum norm solution 
s* in (23 1 satisfies J2i = and is given by 

(26) s: = --dw{Ym = --Y,Y,,. 

In Section [7] we shall see that this is the well-known Borda count in social choice 
theory, a measure that is also widely used in psychology and statistics [26j (251 1301 
I3n [13]. Since G is a complete graph only when the ranking data is complete, i.e. 
every voter has rated every alternative, this is an unrealistic scenario for the type 
of modern ranking data discussed in Section [l] Among other things, the Hodge 
theoretic framework generalizes Borda count to scenarios where the ranking data 
is incomplete or even highly incomplete. 

In ^ the locally cyclic ranking component is obtained by solving 

min II curl* ^ — R*\\2 w — ni™ II curl* <i> — Yllo w 

The above equality implies that there is no need to first solve for R* before we may 
obtain 4>; one could get it directly from the pairwise ranking data Y. Note that 
the solution is only determined up to an additive term of the form grad s since by 
virtue of (18), 

(27) curl(4> -I- grad s) = curl 

For the sake of well-posedness, we will seek the unique minimum norm solution 
given by 

$* = (,5i o Sl^SiY = (curlocurl*)^ curlF 
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and the required component is given by proj;j^((.m.j.j Y = curl* $*. The reader may 
have noted a parallel between the two problems 

min llgrad s — y II2 to and min llcurl* $ — y lU u,. 
sec° ' $ec2 

Indeed in many contexts, s is called the scalar potential while <i> is called the vector 
potential. As seen earlier in Definition |2.1[ an edge flow of the form grad s for some 
s G C° is called a gradient flow; in analogy, we will call an edge flow of the form 
curl* 4> for some $ e a curl flow. 

We note that the ^2-residual R* , being divergence-free, is a cyclic ranking. Much 



like ( 27 1 , the divergence-free condition is satisfied by a whole family of edge fiows 
that differs from R* only by a term of the form curl* <f> since 

div(i?* + curl* $) = div R* 



because of (18 1. The subset of given by 

{R* + curl* $ I $ e C^} 
is called the homology class of R*. The harmonic ranking projj,(,j.(^^-) y is just 
one element in this classj^ In general, it will be dense in the sense that it will be 
nonzero on almost every edge in E. This is because in addition to the divergence-free 
condition, the harmonic ranking must also satisfy the curl-free condition by virtue 



of (21 1. So if parsimony or sparsity is the objective, e.g. if one wants to identify 
a small number of conflicting comparisons that give rise to the inconsistencies in 
the ranking data, then the harmonic ranking does not offer much information in 
this regard. To better understand ranking inconsistencies via the structure of R* , 
it is often helpful to look for elements in the same homology class with the sparsest 
support, i.e. 

min Ijcurl* <!> — i?*||o = min jjcurl* <i> — proji^-g^c^ \ ^llo- 

The widely used convex relaxation replacing the Iq-^hotih' by the Zi-norm may 
be employed [21 , i.e. 

min llcurl* $ - i?*||i := min V. .|(curl* - i?*.|. 

A solution (f> of such an /i-minimization problem is expected to give a sparse element 
i?*— curl* <i>, which we call an li- approximate sparse generator of R* , or equivalently, 
of projj,j,j.(-_^^-) y . We will discuss them in detail in Section 1 



6.2 



The bottom line here 

is that we want to flnd the shortest cycles that represent the global inconsistencies 
and perhaps remove the corresponding edges in the pairwise comparison graph, in 



view of what we will discuss next in Section 5.2 One plausible strategy to get 
a globally consistent ranking is to remove a number of problematic 'conflicting' 
comparisons from the pairwise comparison graph. Since it is only reasonable to 
remove as few edges as possible, this translates to finding a homology class with 
the sparsest support. This is similar to the minimum feedback arc set approach 
discussed in Section [LH 



We will end the discussion of this section with a note on computational costs. 



Solving for a global ranking s* in (23 1 only requires the solution of an n x n least 
squares problem, which comes with a modest cost of 0{n^) flops (n = As 
we note later in Section |8.3[ for web ranking analysis such a cost is no more than 



Two elements of the same homology class are called homologous. 
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computing the PageRank. On the other hand, the analysis of inconsistency is 
generally harder. For example, evaluating curls requires |T| flops and this is (g) ~ 
0{n^) in the worst case. Since an actual computation of <&* involves solving a 
least squares problem of size \T\ x \T\, the computation cost incurred is of order 
0(71,^). Nevertheless, any sparsity in the data (when |r| <C n^) may be exploited by 
choosing the right least squares solver. For example, one may use the general sparse 
least squares solver LSQR [33^ or the new minres-QLP [51 0] that works specifically 
for symmetric matrices. We will leave discussions of actual computations and more 
extensive numerical experiments to a future article. It suffices to note here that it 
is in general harder to isolate the harmonic component of the ranking data than 
the globally consistent component. 

5.2. Local Consistency versus Global Consistency. In this section, we dis- 
cuss a useful result, that local consistency implies global consistency whenever the 
harmonic component is absent from the ranking data. Whether a harmonic compo- 
nent exists is dependent on the topology of the clique complex Kq. We will invoke 
the recent work of Kahle |22j on such topological properties of random graphs to 
argue that harmonic components are exceedingly unlikely to occur. 



By Lemma 4.6 the dimension of ker(Ai) is equal to the first Betti number Pi{K) 
of the underlying simplicial complex K. In particular, we know that ker(Ai) = if 
f3i{K) = 0, and so the harmonic component of any edge flow on K is automatically 
absent when f3i{K) = (roughly speaking, f3i{K) = means that K does not have 
any 1-dimensional holes). This leads to the following result. 

Theorem 5.2. Let Kq — {V, E, T{E)) he a 3-clique complex of a pairwise compar- 
ison graph G = {V^E). If Kq does not contain any 1-loops, i.e. I3i{Kq) — 0, then 
every locally consistent pairwise ranking is also globally consistent. In other words, 
if the edge flow X € C^{Kq,M.) is curl-free, i.e. 

curl(X)(^,J,fc) = 

for all {i,j,k} G T{E), then it is a gradient flow, i.e. there exists s G C^{Kq,'&) 
such that 

X ~ grad s. 

Proof. This follows from the Helmholtz decomposition theorem since dim(ker Ai) — 
(Di{Kq) = and so any X that is curl- free is automatically in im(grad). □ 

When G is a complete graph, then we always have that (3i{Kc) — I3i{Kq) — 
and this justifles the discussion after Definition |2.3| about the equivalence of 
local and global consistencies for complete pairwise comparison graphs. In general, 
G will be incomplete due to missing ranking data (not all voters have rated all 
alternatives) but as long as Kq is loop-free, such a claim still holds. In finance, 
this theorem translates into the well-known result that "triangular arbitrage-free 
implies arbitrage-free." The theorem enables us to infer global consistency from a 
local condition — whether the ranking data is curl-free. We note that being curl- 
free is a strong condition. If we instead have "triangular transitivity" in the ordinal 
sense, i.e. a ^ 6 ^ c implies a >z c, then there is no result analogous to Theorem 

At least for Erdos-Renyi random graphs, the Betti number f3i could only be 
non-zero when the edges are neither too sparse nor too dense. The following result 
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by Kahle 22J quantifies this statement. He showed that /3i undergoes two phase 
transitions from zero to nonzero and back to zero as the density of edges grows. 

Theorem 5.3 (Kahle 2006). For an Erdds-Renyi random graph G{n,p) on n ver- 
tices where the edges are independently generated with probability p, its clique com- 
plex Kg almost always has Pi{Kq) — 0, except when 

(28) ^<p<-. 

Without getting into a discussion about whether Erdos-Renyi random graphs are 
good models for pairwise ranking comparison graphs of real-world ranking data, we 
note that the Netflix pairwise comparison graph has a high probability of having 
Pi{Kq) = if Kahle's result applies. Although the original customer-product rating 
matrix of the Netflix prize dataset is highly incomplete (more than 99% missing 
values), its pairwise comparison graph is very dense (less than 0.22% missing edges). 
In other words, p (probability of an edge) and n (number of vertices) are both large 
and so ( pS] ) is not satisfied. 

6. Zi-ASPECTS OF Hodge Theoretic Ranking 

Hodge theory is by and large an ^2-theory: inner products on cochains, adjoint 
of coboundary operators, orthogonality of Hodge decomposition, are all naturally 
associated with (weighted or unweighted) ^2-iiorms. In this section, we will take 
an oblique approach and study the /i-aspects of combinatorial Hodge theory in the 
context of statistical ranking, with robustness and parsimony (or sparsity) being 
our two obvious motivations. We will study two Zi-norm minimization problems: 
(1) the Zi-projection on gradient flows (globally consistent rankings), which we show 
to have a dual problem as correlation maximization over bounded divergence-free 
flows (cyclic rankings); (2) an Zi-approximation to find sparse divergence-free fiows 
(cyclic rankings) homologous to the residual of the Z2-projection, which we show 
to have a dual problem as correlation maximization over bounded curl-free flows 
(locally consistent rankings). We observe that the primal versus dual relation is 
revealed as an 'im(grad) versus ker(div)' relation in first case and an 'im(curr) 
versus ker(curl)' relation in the second case. 

6.1. Robust Ranking: Zi-projection on gradient flows. We have briefly men- 
tioned this problem in Section [2] as an Zi-variation of the least squares model ([t]) 
for statistical ranking. Here we will derive a duality result for ([9|. As before, we 
assume a pairwise comparison graph G = (V, E) and an edge flow Y e C'^{Kg: K) 
that comes from our ranking data. Consider the following minimization problem, 

min 

(29) s.t. X^grads, 

X = -X^, 

which may be regarded as the ^i-projectioij^ of an edge flow Y onto the space of 
gradient flows, 

(30) min llgrad s — yll 1 ^, = min > WiAsn — Si — YiA. 



^'^Thc projection of a point X onto a closed subset S in a finite-dimensional norm space is 
simply the unique point Xg S S that is nearest to X in the norm. 
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In other words, we attempt to find the nearest globally consistent ranking grad s to 
the pairwise ranking Y as measured by the ^i-norm. Such a norm is often employed 
in robust regression since its solutions will be relatively more robust to outliers or 
large deviations in the ranking data Y when compared to the ^2-norm in ([t]) [4rill2j. 
The computational cost paid in going from Q to ( 29 1 is that of replacing a linear 



least squares problem with a linear programming problem. 

Recall that the minimum norm ^2-niiiiimizer is given by s* — — (Aq)^ divY and 
the /2-residual is given hy R* = Y — grad s* . Hence 

minjjgrads - Y\\i w = min ||grads' - 

sec" s'ieC 



where s' — s ^ s* . It follows that the Zi-minimizers in (30 1 may be characterized 

argmin^gpo II grads - Y\\i.w = argmin^g^o || grads - y||2,u. 

+ argmin^/gpo || grads' - 

The deviation from the minimum norm /2-niinimizer s* is a 'median gradient flow' 
extracted from the cyclic residual R* , which moves the /i-residual Y — grad(s* + s) 
outside the space of divergence-free flows; here 

s G argmin^,gpo llgrad s' — i?*|li,to. 



On the other hand, in the dual problem to ( 29 1 , we search for a solution inside the 



space of divergence-free flows. More precisely, the dual form of the Zi-projection 



( 29 1 searches within a space of bounded divergence-free flows for a flow that is 



maximally correlated with Y . Before we state this theorem, we note that the inner 



product defined in (14 1 for skew- symmetric matrices representing edge flows, 

{X, Y)^ ^^^^ WijXijYij, 



also defines an inner product over M"^" if the symmetric weight matrix W — [wij] 
has no zero entries, i.e. Wij > for all i,j. We will assume that this is the case in 
the following proposition. 



Proposition 6.1. The li-projection problem (29 I has the following dual problem, 

max (X,Y)t^ 



T 



X = -X 

Proof. This follows from standard duality theory for linear programming. See |44j 
for example. □ 



Proposition 6.1 shows that for ?i-projections, the dual problem searches in the 
orthogonal complement of the primal domain. The primal search space is the space 
of gradient fiows im(grad) while the dual search space is the space of divergence- 
free flows ker(div). Recall that for ^2-projections, gradient flows correspond to the 
solutions while divergence-free flows correspond to the residuals. So the solution- 
residual split in the Z2-setting is in this sense analogous to the primal-dual split in 
Zi-setting. 



^ ^Recall that argmin refers to the set of all minimizers. The addition of sets here is just the 
usual Minkowski sum. 
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An optimal ^i-minimizer of (29) can only be decided up to a constant from the 
complementary conditions, 



0< |Xy| <l^s, 



The constraint Si = may be imposed to remove this extra degree of freedom. 

6.2. Conflict Identiflcation: Zi-minimization for approximate sparse cyclic 
rankings. In the discussion at the end of Section |5.H we mentioned that an li- 
approximate sparse cyclic ranking for R* may be formulated as the following li- 
minimization problem, 



(32) 



mm 
s.t. 



11^- 
X = 
X = 



-R*\\i 
curl* <I>, 



This is equivalent to 

min II curl* $ - i?* || i V | (curl* - R*. 

which is in turn equivalent to 



mm 



curl*$-projk„.(Ai)^lli' 



where proji^;,,. Y is the harmonic component in R* . The chief motivation for this 
minimization problem has been explained at the end of Section |5.1| — we would 
like to identify the edges of conflicting pairs in a pairwise comparison graph so that 
we may have the option of removing them to get a globally consistent ranking. 



Both (29 1 and (32 1 are Zi-norm minimizations over some pairwise ranking flows. 



The main difference between them lies in that the former model searches over 
im(grad), the space of gradient flows, i.e. where X = grad s, while the latter model 
searches over im(curl*), the space of curl flows, i.e. where X — curl* The number 
of free parameters in grads is just |y| = n but the number of free parameters in 
curl* $ is |T(£')|, which is typically of the order O(n^). Therefore we expect to be 
able to get a residual for ( 32 1 that is much sparser than the residual for ( 29 ) simply 



because we are searching over a much larger space. As an illustration. Figure [3] 
shows the results of these two optimization problems on the same data. 



The next proposition shows that the dual problem of ( 32 1 also maximizes cor- 



relation with the given pairwise ranking flow R* but over bounded curl-free flows 



instead of bounded divergence-free flows as in (31 1 



Proposition 6.2. Let the inner product he as defined in (14), i.e. 

{X,Y)^ := ^^.^^^^^WijX.jYij. 
The dual problem of the 1 1 -minimization \?>2\ is 



max 
s.t. 



(X, R*)w 

curlX = 0, 
X = -X^. 



Proof. Similar to Proposition 6.1 with grad replaced by curl 



□ 
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Figure 3. Comparisons of the two Zi-optimizations, (29) and (32), 
with the same harmonic ranking. For simphcity we set weights 
Wij — 1 . The arrows in the picture indicate the edge flow direction 
of pairwise rankings. A. a harmonic ranking flow h\ B. the li- 



projection on the gradient flows by (29) (i.e. gradso where sq = 
argmin,, || grads — /i||i); C. the /i-projection residual in (29) (i.e. 
h — gradso); D. the approximate sparse cycles by (32) (i.e. h — 
curl* $0 where 4>o = argmin^ || curl* <i> — h\\i); E. the /i-projection 
on locally cyclic flows by ([32|) (i.e. curl* <i>o)- 



As we can see, curl in Proposition |6 . 2|pl ays the role of div in Proposition |6.1|in th e 
dual problem and curl* in Proposition 6.2 plays the role of grad in Proposition |6 . 1 | in 
the primal problem. There is a slight difference on the upper bounds for |, due to 
the fact that ( 29 1 uses a weighted ^i-norm while ( 32 ) uses an unweighted Zi-norm. In 
both propositions, the primal and dual search spaces are orthogonal complements of 
each other as given by the Helmholtz decomposition theorem. These two problems 
thus exhibit a kind of structural duality. 



7. Connections to Social Choice Theory 

Social choice theory is almost undoubtedly the discipline most closely associated 
with the study of ranking, having a long history dating back to Condorcet's famous 
treatise in 1785 [TO] and a large body of work that led to at least two Nobel prizes 

The famous impossibility theorems of Arrow [2 and Sen [22 in social choice the- 
ory formalized the inherent difficulty of achieving a global ranking of alternatives by 
aggregating over the voters. However it is still possible to perform an approximate 
rank aggregation in reasonable, systematic manners. Among the various proposed 
methods, the best known ones are those by Condorcet [lOj, Borda [13], and Kemeny 
P5] . In particular, the Kemeny approach is often regarded as the best approximate 
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rank aggregation method under some assumptions |46l |45] . It is however NP-hard 
to compute and its sole rehance on ordinal information may be unnatural in the 
context of score-based cardinal data. 

We have described earlier how the minimization of ([t]) over the gradient flow 
model class 

Mg ^ {X e \ = Sj - s„ s:V-^ M} 

leads to a Hodge theoretic generalization of Borda count but the minimization of 
(j?! over the Kemeny model class 

MK^{XeC^\ = sign(sj - s,), s : y ^ M} 

leads to Kemeny optimization. In this section, we will discuss this connection in 
greater detail. 

The following are some desirable properties of ranking data that have been widely 
studied, used, and assumed in social choice theory. A ranking problem is called 
complete if each voter in A gives a total ordering or permutation of all alternatives 
in V; this implies that w°j > for all a S A and all distinct i,j S V, in the 
terminology of Section|2] It is balanced if the pairwise comparison graph G = (V, E) 
is fc-regular with equal weights Wij — c for all € E. A complete and balanced 

ranking induces a complete graph with equal weights on all edges. Moreover, it 
is binary if every pairwise comparison is allowed only two values, say, ±1 without 
loss of generality. So = 1 if voter a prefers alternative j to alternative i, and 
Fj" = — 1 otherwise. Ties are disallowed to keep the discussion simple. 

Classical social choice theory often assumes complete, balanced, and binary rank- 
ings. However, these are all unrealistic assumptions for modern data coming from 
internet and e-commerce applications. Take the Netflix dataset for illustration, a 
typical user a of Netflix would have rated at most a very small fraction of the entire 



Netflix inventory. Indeed, as we have mentioned in Section 2.2.1 the viewer-movie 
rating matrix has 99% missing values. Moreover, while blockbuster movies would 
receive a disproportionately large number of ratings, since just about every viewer 
has watched them, the more obscure or special interest movies would receive very 
few ratings. In other words, the Netflix dataset is highly incomplete and highly im- 
balanced. Therefore its pairwise comparison graph is expected to have a sparse edge 
structure if we ignore pairs of movies where few comparisons have been madj^ 

Lastly, as we have discussed in Section |2.2[ most modern ranking datasets in- 
cluding the Netflix one are given in terms of ratings or scores on the alternatives by 
the voters (e.g. one through five stars). While it is possible to ignore the cardinal 
nature of the dataset and just use its ordinal information to construct a binary 
pairwise ranking, we would be losing valuable information — for example, a 5-star 
versus 1-star comparison is indistinguishable from a 3-star versus 2-star comparison 
when one only takes the ordinal information into account. 

Therefore, one is ill-advised to apply methods from classical social choice theory 
to modern ranking data directly. We will see in the next section that our Hodge 
theoretic extension of Borda count adapts to these new features in modern datasets, 
i.e. incomplete, imbalanced, cardinal data, but still restricts to the usual Borda 
count in social choice theory for data that is complete, balanced, and ordinal/binary. 



^■^This will not be true if we do not perform such thresholding. As we noted earlier, the Netflix 
pairwise comparison graph is almost a complete graph missing only 0.22% of its edges although 
the Netflix dataset has 99% of its values missing. 
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The reader may wonder why the impossibihty theorems of social choice theory 
do not invahdate our Hodge theoretic approach. One reason is given in the previ- 
ous paragraph, namely, we work under different assumptions: our ranking data is 
incomplete, imbalanced, cardinal, and so these impossibility results do not apply. 
In particular, these impossibility theorems are about intransitivity, i.e. whether one 
might have i ^ j ^ k )^ i, which is an ordinal condition; but our approach deals 
with inconsistency, i.e. whether one might have Xij + Xj^. + X]^i ^ 0, which is a 
cardinal condition. The second and more important reason is that we do not merely 
seek a global ranking but also a locally cyclic ranking and a harmonic ranking, with 
the latter two components accounting for the cyclic inconsistencies in the ranking 
data. We acknowledge at the outset that not all datasets can be reasonably assigned 
a global ranking but can sometimes be cyclic in nature. So we instead seek to an- 
alyze ranking data by examining its three constituting components: global, local, 
harmonic. The magnitude of the cyclic (local -I- harmonic) component then quan- 
tifies the inconsistencies that impede a global ranking. We do not always regard 
the cyclic component, which measures the cardinal equivalent of the impossibilities 
in social choice theory, as noise. In our framework, the data may be 'explained' by 
a global ranking only when the cyclic component is small; if that is not the case, 
then the cyclic component is an integral part of the ranking data and one has no 
reason to think that the global component would be any more informative than the 
cyclic component. 

7.1. Kemeny Optimization and Borda Count. The basic idea of Kemeny's 
rule [2?1 124] is to minimize the number of pairwise mismatches from a given ordering 
of the alternatives to a voting profile, i.e. the collection of total orders on the 
alternatives by each voter. The minimizers are called the Kemeny optima and 
are often regarded as the most reasonable candidates for a global ranking of the 
alternatives. To be precise, we define the binary pairwise ranking associated with a 
permutation a G &n (the permutation group on n elements) to be Y"^ = sign(o'(i) — 
<j{j)). Given two total orders or permutations on the n alternatives, cr, r G S„, the 
Kemeny distance (also known as Kemeny-Snell or Kendall r distance) is defined 
to be 



i.e. the number of pairwise mismatches between a and r. Given a voting profile as 
a set of permutations on = {1, . . . , n} by m voters, {r, e 6„ | i = 1, . . . , m}, the 
following combinatorial minimization problem 



is called Kemeny optimization and is known to be NP-hard |16j with respect to 
n when to > 4. For binary-valued rankings with S the optimization 

problem 



counts up to a constant the number of pairwise mismatches from a total order. 
Hence for a complete, balanced, and binary- valued ranking problem, our minimiza- 
tion problem ([T]) becomes Kemeny optimization if we replace the subspace M.q hy 
the discrete subset Mr- 




(33) 




(34) 
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Another well-known method for rank aggregation is the Borda count [14^ , which 
assigns a voter's top ith alternative a position-based score of n—i; the global ranking 
on V is then derived from the sum of its scores over all voters. This is equivalent 
to saying that the global ranking of the ith alternative is derived from the score 

(35) = ^ Y^,, 

i.e. the alternative that has the most pairwise comparisons in favor of it from all 
voters will be ranked first, and so on. As we have found in (26 1, the minimum norm 
solution of the ^2-projection onto gradient flows is given by 



, -ik — ~C > Fj^, 



where c is a positive constant. Hence for a complete, balanced, and binary ranking 
problem, the Hodge theoretic approach yields the Borda count up (to a positive 
multiplicative constant that has no effect on the ordering of alternatives by scores). 

7.2. Comparative Studies. The following theorem gives three equivalent char- 
acterizations of (34 1 when Y^" E {±1}- Note that here we do not assume that the 
data is complete and balanced. 

Theorem 7.1. Suppose thatY,^ e {il}- The following optimization problems are 
all equivalent: 

(i) The weighted least squares problem, 

min . wt,{X,,-Yi;f, 

where 

Mk ^{X eA \ X,j = sign(sj - s^), s:V ^R}. 
(ii) The linear programming problem, 
(36) m.sx.(X,Y) = max > WjiXnYa, 

where ICi is the set 

{E.e5„M.^" I E.A'. = 1, >0, /'g=sign(a(j)-a(z))}. 
(Hi) The weighted li-minimization problem, 

where IC2 is the set 

{X e A \ {sj - Si)Xij > for some s : V ^ R and {i,j} G E}. 

(iv) The minimum feedback arc set of the weighted directed graph G\yoy — (^j T], Wo 
Y), whose vertex set is V , directed edge {i, j) & E <Z V x V iff Y^j > with 
weight WijYij. 
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Proof. Assuming Since Xij e {±1}, we obtain 

— c 2 ^ ^ WijX'ijY-ij 
where c is a constant that does not depend on X. So the problem becomes 
(38) max > waXaYij. 

Since Mk is a discrete set containing n\ points, a hnear programming problem over 
A4k is equivalent to searching over its convex hull, i.e. /Ci, which gives 

(pv|) can also be derived from ( 38 ) . Consider a weighted directed graph G-iyoY 



where an edge € E iS Yij > 0, and in which case has weight \wijYij\. ( |38| is 
equivalent to finding a directed acyclic graph by reverting a set of edge directions 
whose weight sum is minimized. This is exactly the minimum feedback arc set 
problem. 

Finally, we show that ( pli| is also equivalent to the minimum feedback arc set 
problem. For any X G IC2, the transitive region, there is an associated weighted 
directed acyclic graph Gwox where an edge G E iS Xij > 0, and in which case 



has weight \wijXij\. Note that an optimizer of ( |37| has either X*j = —X*^ = Yij 
or X*j = —X*^ = on an edge G E, which is equivalent to the problem of 

finding a directed acyclic graph by deleting a set of edges from G^^o? such that the 
sum of their weights is minimized. Again, this is exactly the minimum feedback 
arc set problem. 

□ 



The set /Ci is the convex hull of the skew- symmetric permutation matrices P'^ 
as defined in [46'. The set /C2 is called the transitive pairwise region by Saari [34], 
which comprises n\ cones corresponding to each of the n\ permutations on V. 

It is known that the minimum feedback arc set problem in ( pv| is NP-hard, and 
therefore, so are the other three. Moreover, ( pii| ) provides us with some geometric 
insights when we view it alongside with (|7]), the Z2-projection onto gradient flows 
A4g — {X e A I Xij — Sj — Si, s : V ^ M.} which we have seen to be a Hodge the- 
oretic extension of Borda count. We will illustrate their differences and similarities 
pictorially via the following example borrowed from Saari |34| . 

Consider the simplest case of three-item comparison with V — {i,j,k}. For 
simplicity, we will assume that Wij — Wjk — Wki — 1 and Yij,Yjk,Yki G [—1,1]. 
Figure [4] shows the unit cube in E'^. We will label the coordinates in as 
[Xij, Xjk, Xki] (instead of the usual [x,y,z]). The shaded plane corresponds to 
the set where Xij + Xj^ + Xi^i = in the unit cube. Note that this set is 
equal to the model class M.g because of ( [T2| ). On the other hand, the transi- 
tive pairwise region /C2 consists of the six orthants within the cube with vertices 
{±1,±,1,±1}- {[1,1,1], [-1,-1,-1]}. We will write 
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The Hodge theoretic optimization ^ is the Z2-pi'ojection onto the plane Xij+Xjk + 
Xki = 0, while by ( pli| , the Kemeny optimization (34) is the Zi-projection onto the 
aforementioned six orthants representing the transitive pairwise region 1C2- 

In the general setting of social choice theory, the following theorem from [M] 
characterizes the order relations between the Kemeny optimization and the Borda 
count. 

Theorem 7.2 (Saari-Merlin 2000). The Kemeny winner (the most preferred) is 
always strictly above the Kemeny loser (the least preferred) under the Borda count; 
similarly the Borda winner is always strictly above the Borda loser under the Ke- 
meny rule. There is no other constraint in the sense that the two methods may 
generate arbitrary different total orders except for those constraints. 

The Kemeny rule has several desirable properties in social choice theory which 
the Borda count lacks 02] • The Kemeny rule satisfies the Condorcet rule, in the 
sense that if an alternative in V wins all pairwise comparisons against other al- 
ternatives in V , then it must be the overall winner. A Condorcet winner is any 
alternative i such that '^2,- sign(^^ Y^) = n. Note that the Condorcet winner 
may not exist in general but Kemeny or Borda winners always exist. However, if a 
Condorcet winner exists, then it must be the Kemeny winner. On the other hand, 
Borda count can only ensure that the Condorcet winner is ranked strictly above 
the Condorcet loser (least-preferred). Another major advantage of the Kemeny 
rule is its consistency in global rankings under the elimination of alternatives in V . 
The Borda count and many other position-based rules fail to meet this condition. 
In fact, the Kemeny rule is the unique rule that meets all three of following: (1) 
satisfies the Condorcet rule, (2) consistency under elimination, and (3) a natural 
property called neutral (that we will not discuss here). See [46 for further details. 

Despite the many important features that the Kemeny rule has, its high compu- 
tational cost (NP-hard) makes simpler rules like Borda count attractive in practice, 
especially when there is large number of alternatives to be ranked. Moreover, in car- 
dinal rankings where it is desirable to preserve the magnitude of score differences 
\r]\ and not just the order relation, using the Hodge theoretic variant of Borda 
count with model class M.g becomes more relevant than Kemeny optimization 
with model class M.k. 



8. Experimental Studies 

We present three examples of Hodge theoretic ranking analysis of real data with 
the hope that these preliminary results would illustrate some basic ideas of our 
approach. 

The first example is about movie ranking on a subset of Netflix data. We show 
that (i) the use of pairwise ranking together with Hodge decomposition reduces 
temporal drift bias, and (ii) the triangular curls provide a metric for characterizing 
inconsistencies in the ranking data. The second example illustrates the use of 
Hodge decomposition for finding a universal equivalent or price function (i.e. global 
ranking) in a currency exchange market where triangular arbitrage-free implies 
arbitrage- free (i.e. harmonic component is 0). The third example describes how the 
global ranking component in Hodge decomposition may be used to approximate 
PageRank via reversible Markov chains. 
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Figure 4. The shaded region is the subspace Xij+ Xjk + X^i = 0. 
The transitive region consists of six orthants whose correspond- 
ing vertices belong to {±1, ±, 1, ±1} - {[1, 1, 1], [-1, -1, -1]}. The 
Borda count or minxewiG ^i-^) the /2-pi'ojection onto the shaded 
plane while the Kemeny optimization or minxeMK H-^) the h- 
projection onto the transitive region. 



8.1. Movie Ranking on a Subset of Netflix Data. The Netflix prize dataseip^ 
contains about 17,000 movies rated by 480,000 customers over 74 months from 
November 1998 to December 2005. Each customer rated 209 movies on average 
and around 99% of the ratings are absent from the customer-product matrix. We 
do not seek to address the Netflix prize problem of ratings prediction here. Instead 
we take advantage of this rare publicly available dataset and use it to test the rank 
aggregation capabilities of our method. We would like to aggregate viewers' ratings 
into a global ranking on movies, and to measure the reliability of such a global 
ranking. Note that such rank aggregation could be personalized if one first collects 
the ratings from viewers who share similar tastes with an individual. This could 
then be used for rating prediction if desired which is not pursued here. 

For reasons that we will soon explain, we restrict our selections to movies 
that received ratings on all of the 74 months. There are not many such movies 
— only 25 in all. Several of these have monthly average scores that show sub- 
stantial upward or downward drifts. In Figure |5] we show the temporal varia- 
tions in scores of six of these (numerical indices in the Netflix dataset are given 
in parentheses): Dune (17064), Interview with the Vamipire (8079), October 
Sky (12473), Shakespeare in Love (17764), The Waterboy (14660), and Witness 
(15057). Such temporal variations make it dubious to rank movies by simply tak- 
ing average score over all users, as ratings over different time periods may not be 
comparable under the same scale. It is perhaps worth noting that understanding 
the temporal dynamics in the Netflix dataset has been a key factor in the approach 
of Bell and Koren [5]. We will see below that the use of pairwise ranking and 
Hodge decomposition provides an effective method to globally rank the movies and 
detect any inherent inconsistency and that is furthermore robust under temporal 
variations. 



http : / /www . netf lixpr ize . com 
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Formation of pairwise ranking.: Since pairwise rankings are relative mea- 
sures, we expect that they will reduce the effect of temporal drift. We em- 
ploy three of the statistics described in Section |2.2.1| to form our pairwise 
rankings, using only ratings by the same customer in the same month. We 
compute the arithmetic mean of score differences, 

'■^ #{a I o,ai,ciaj exist in the same month}' 
the geometric mean of score ratios, 



#{q; I aai,aaj exist in the same month}' 



Y = 

and binary comparisons, 

Yij = Pr{a I aaj > Uat} - Prja | a^j < a^J, 

where a is such that a^i, Qaj exist in the same month. 

Since there is nothing to suggest that a logarithmic scale is relevant, the 
logarithmic odds ratio gives rather poor result as expected and we omitted 
it. For comparison, we compute the mean score of each movie over all 
customers, ignoring the temporal information. A reference score is collected 
independently from MRQE (Movie Review Query Engine)]^ the largest 
online directory of movie reviews on the internet . 
Global ranking by Hodge decomposition.: We then solve the regression 
problem in ([T]) to obtain a projection of pairwise ranking flows onto gradi- 
ent flows, given by Theorem 5.1 i]). Note that in this example, the pairwise 



ranking graph is complete with n = 6 nodes. Table [T] collects the compar- 
isons between different global rankings. The reference order of movies is 
again via the MRQE scores. 
Inconsistencies and curls.: Since the pairwise ranking graph is complete, 
its clique complex is a simplex with n = 6 vertices and so the harmonic 
term in the Hodge decomposition is always zero. Hence the residual in 
Theorem 5.1 is just the curl projection, i.e. R* = projij,-j/(.jjj.i.-) Y. We will 



define two mdices of inconsistency to evaluate the results. The first, called 
cyclicity ratio, is a measure of global inconsistency given by 

^ _ \\R*\\l,w 

wnij 

while the second, called relative curl, quantifies the local inconsistency, and 
is given by the following function of edges and triangles, 

{cnv\Y){i,j, k) _ %j + Yjk + Yu 



3(grads*)(i,j) 3(s* - s*) 



Note that on every triangle tijk the curl Y^j + Yj^. + Y^i measures the total 
sum of cyclic flow, therefore Cr measures the magnitude of its induced edge 
flow relative to the gradient edge flow of the global ranking s* . If has 
absolute value larger than 1, then the average cyclic flow has an effect larger 
than the global ranking s* , which indicates that the global ranking s* might 
be inconsistent on the pair of items. 

^' http://mrw.mrqe . com 
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Shakespeare in Love Witness October Sky 




Month Number 



Figure 5. Average scores of 6 selected movies over 74 months. 
The three movies in the top row has a decreasing trend in monthly 
average scores, while in a contrast the other three movies in the 
bottom row exhibits an increasing trend. 



Table [TJshows that in terms of cyclicity ratio, the best global ranking is obtained 
from Hodge decomposition of pair wise rankings from binary comparisons, which 
has the smallest cyclicity ratio, 0.30. This global ranking is quite different from 
merely taking mean scores and a better predictor of MRQE. 

A closer analysis of relative curls allows us to identify the dubious scores. We will 
see that the placement of Witness and October Sky according to the global ranking 
contains significant inconsistency and should not be trusted. This inconsistency is 
largely due to the curls in the triangles 

ii = (Witness, October Sky, The Waterboy), 

t2 = (Witness, October Sky, Interview with the Vampire). 

In fact, there are only two relative curls whose magnitudes exceed 1; both occurred 
on triangles that contain the edge e = (Witness, October Sky): The relative curl 
of ti with respect to e is 3.6039 while that of ^2 with respect to e is 4.1338. As we 
can see from Table [T] the inconsistency (large curl) manifests itself as instability 
in the placement of Witness and October Sky — the results vary across different 
rank aggregation methods with no possibility of consensus. This illustrates the use 
of curl as a certificate of validity for global ranking. 
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Movie 






Global ranking 


(Score) 




MRQE 


Mean 


Hodge-Difference 


Hodge- Ratio 


Hodge-Binary 


Shakespeare in Love 


1 (85) 


2 (3.87) 


1 (0.247) 


2 (0.0781) 


1 (0.138) 


Witness 


2 (77) 


3 (3.86) 


2 (0.217) 


1 (0.0883) 


3 (0.107) 


October Sky 


3 (76) 


1 (3.93) 


3 (0.213) 


3 (0.0775) 


2 (0.111) 


The Waterboy 


4 (66) 


6 (3.38) 


6 (-0.464) 


6 (-0.1624) 


6 (-0.252) 


Interview with the Vampire 


5 (65) 


4 (3.71) 


4 (-0.031) 


4 (-0.0121) 


4 (-.012) 


Dune 


6 (44) 


5 (3.49) 


5 (-0.183) 


5 (-0.0693) 


5 (-0.092) 


Cyclicity ratio 






0.77 


1.15 


0.30 



Table 1 . Global ranking of selected six movies via different meth- 
ods: MRQE, mean score over customers, Hodge decomposition 
with algorithmic mean score difference, Hodge decomposition with 
geometric mean score ratio, and Hodge decomposition with binary 
comparisons, ft can be seen that the Hodge decomposition with 
binary comparisons has the smallest inconsistency in terms of the 
cyclicity ratio. 



8.2. Currency Exchange Market. This example illustrates a globally consistent 
pairwise ranking on a complete graph using currency exchange data taken from 
Yahoo! Financcrj Consider a currency exchange market with V representing a 
collection of seven currencies, USD, JPY, EUR, CAD, GBP, AUD, and CHF. In 
this case, G — {V, E) is a complete graph since every two currencies in V are 
exchangeable. Table [2] shows the exchange rates. By logarithmic transform the 



exchange rates can be converted into pairwise rankings as in Example 2.2.2 The 



global ranking is the solution in (23 1 (where Sq — SJ) defines an universal equivalent 
which measures the 'value' of each currency. As the reader can easily check, the 
logarithmic transform of the data in Table |2] is curl- free (up to machine precision), 
which in this context means triangular arbitrage-free. In other words, there is no 
way one could profit from a cyclic exchange of any three currencies in V. Since G is 
a complete graph, the data has no harmonic components; so Hodge decomposition 
tells us that local consistency must imply global consistency, which in this context 
means arbitrage-free. In other words, there is no way one could profit from a cyclic 
exchange of any number of currencies in V either. 

8.3. Comparisons with PageRank and HITS. We apply Hodge theoretic rank- 
ing to the problem of web ranking, which we assumed here to mean any static linked 
objects, not necessarily the World Wide Web. As we shall see Hodge decomposition 
provides an alternative to PageRank [7] and HITS [57]. In particular, it gives a 
new way to approximate PageRank and enables us to study the inconsistency or 
cyclicity in PageRank models. 

Consider a link matrix L where Lij is the number of links from site i to j. There 
are two well-known spectral approach to computing the global rankings of websites 
from L, HITS and PageRank. HITS computes the singular value decomposition 
L = C/EF^, where the primary left-singular vector ui gives the hub ranking and 
the primary right-singular vector vi gives the authority ranking (both ui and vi are 



IE 



http : / /finance . yahoo . com/ currency- converter 
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Currency exchange rate table 





USD 


JPY 


EUR 


CAD 


GBP 


AUD 


CHF 


1 USD = 


1.0000 


114.6700 


0.6869 


0.9187 


0.4790 


1.0768 


1.1439 


1 JPY = 


0.0087 


1.0000 


0.0060 


0.0080 


0.0042 


0.0094 


0.0100 


1 EUR = 


1.4558 


166.9365 


1.0000 


1.3374 


0.6974 


1.5676 


1.6653 


1 CAD = 


1.0885 


124.8177 


0.7477 


1.0000 


0.5214 


1.1721 


1.2451 


1 GBP = 


2.0875 


239.3791 


1.4340 


1.9178 


1.0000 


2.2478 


2.3879 


1 AUD = 


0.9287 


106.4940 


0.6379 


0.8532 


0.4449 


1.0000 


1.0623 


1 CHF = 


0.8742 


100.2448 


0.6005 


0.8031 


0.4188 


0.9413 


1.0000 


Universal equivalent 


1.7097 


0.0149 


2.4890 


1.8610 


3.5691 


1.5878 


1.4946 



Table 2. The last line is given by exp(— x*) where x* is the so- 
lution to ( 23 1 . The data was taken from the Currency Converter 
Yahoo! Finance on November 6, 2007. 



nonnegative real- valued by the Perron- Frobenius theorem). PageRank constructs 
from L a Markov chain on the sites given by 



(1 



n 



where n is the number of sites and a — 0.85 trades-ofF between Markovian hnk 
jumps and random surfing. 

It is clear that we may define an edge flow via 



(39) 



Y 



log 

J i i 



However what property does such a flow capture in PageRank? To answer this 
question we will need to recall the notion of a reversible Markov chain: An irre- 
ducible Markov chain with transition matrix P and stationary distribution vr is 
reversible if 



Therefore a reversible Markov chain P has a pairwise ranking flow induced from a 
global ranking, 



log 



Pr: 



log TTj - log TTi 



where logTr gives the global ranking. As we mentioned in Section 2.1 logTr may be 
viewed as defining a negative potential on webpages if we regard ranking as being 
directed from a higher potential site to a lower potential site. This leads to the 
following interpretation. 

Let P* be the best reversible approximate of the PageRank Markov chain P, in 
the sense that 



P* = argmiup 



reversible 



P 

log^ 



1 P^0 
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Then the stationary distribution of P* , denoted by tt* , is a Gibbs-Boltzmann dis- 
tribution on webpages with potential — s*, i.e. 



where s* is given by the Hodge projection of Y onto the space of gradient flows. 



Hence the Hodge decomposition of edge flow in ( 39 1 gives the stationary distribution 
of a best reversible approximate of the PageRank Markov chain. 

We may further compute the Hodge decomposition of iterated flows, 

pk 

y ^ log 

^ V ^"6 pk ■ 

Clearly when k oo, the global ranking given by Hodge decomposition converges 
to that given by PageRank. The benefit of the Hodge theoretic approach lies in that 
(i) it provides a way to approximate the PageRank stationary distribution; and (ii) 
it enables us to study the inconsistency or cyclicity in PageRank Markov model. 
The cost of computing the global ranking by Hodge decomposition in Theorem STT] i| 



only involves a least squares problem of the graph Laplacian, which is less expensive 
than eigenvector computations in PageRank. For the benefit of readers unfamiliar 
with numerical linear algebra, it might be worth pointing out that even the most 
basic algorithms for linear least squares problems guarantee global convergence in a 
finite number of steps whereas there are (a) no algorithms for eigenvalue problems 
that would terminate in a finite number of steps as soon as the matrix dimension 
exceeds 4; and (b) no algorithms with guaranteed global convergence for arbitrary 
input matrices. 

To illustrate this discussion, we use the UK Universities Web Link Structure 
dataselE3 The dataset contains the number of web links between 111 UK uni- 
versities in 2002. Independent of this link structure is a research score for each 
university, RAE 2001, performed during the 5- yearly Research Assessment Exer- 
ciscp^ The RAE scores are widely used in UK for measuring the quality of research 
in universities. We used 107 universities by eliminating four that are missing either 
RAE score, in-link, or out-link. The data has also been used by [13] recently but 
for a different purpose. Table |3] summarizes the comparisons among nine global 
rankings: RAE 2001, in-degree, out-degree, HITS authority, HITS hub, PageRank, 
Hodge rank with fc = 1, 2, and 4, respectively. We then use Kendall r-distance [25] 
to count the number of pairwise mismatches between global rankings, normalized 
by the total number of pairwise comparisons. 

9. Summary and Conclusion 

We introduced combinatorial Hodge theory to statistical ranking methods based 
on minimizing pairwise ranking errors over a model space. In particular, we pro- 
posed a Hodge theoretic approach towards determining the global, local, and har- 
monic ranking components of a dataset of voters' scores on alternatives. The global 
ranking is learned via an Z2-projection of a pairwise ranking edge flow onto the space 
of gradient flows. We saw that among other connections to classical social choice 



^^This is available from jhttp : //cybermetrics .wlv. ac .uk/dat abase/stats/data 

counts at the directory level, 
"'^^http:/ /www. rae.ac.uk 



We used 
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Kendall r-distance 





RAE'Ol 


in-degree 


out-degree 


HITS authority 


HITS hub 


PageRank 


Hodge (k = 1) 


Hodge (* = 2) 


Hodge (k = 4) 


RAE'Ol 





0.0994 


0.1166 


0.0961 


0.1115 


0.0969 


0.1358 


0.0975 


0.0971 


in-degree 


0.0994 





0.0652 


0.0142 


0.0627 


0.0068 


0.0711 


0.0074 


0.0065 


out-degree 


0.1166 


0.0652 





0.0672 


0.0148 


0.0647 


0.1183 


0.0639 


0.0647 


HITS authority 


0.0961 


0.0142 


0.0672 





0.0627 


0.0119 


0.0736 


0.0133 


0.0120 


HITS hull 


O.lll.'i 


0.0627 


0.OH8 


0.0627 





0.0615 


0.1121 


0.0607 


0.0615 


PagdiiUik 


0.0909 


0.0068 


0.05.17 


0.0119 


0.0615 





0.0710 


0.0029 


0.0005 


Hodge {k = 1) 


0.1358 


0.0711 


0.1183 


0.0736 


0.1121 


0.0710 





0.0692 


0.0709 


Hodge {* = 2) 


0.0975 


0.0074 


0.0639 


0.0133 


0.0607 


0.0029 


0.0692 





0.0025 


Hodge {* = 3) 


0.0971 


0.0065 


0.0647 


0.0120 


0.0615 


0.0005 


0.0709 


0.0025 






Table 3. Kendall r-distance between different global rankings. 

Note that HITS authority gives the nearest global ranking to the 
research score RAE'Ol, while Hodge decompositions for A; > 2 give 
closer results to PageRank which is the second closest to the 
RAE'Ol. 



theory, the score recovered from this global ranking is a generalization of the well- 
known Borda count to ranking data that is cardinal, imbalanccd, and incomplete. 
The residual left is the Z2-projection onto the space of divergence- free flows. A 
subsequent ^2-projection of this divergence-free residual onto the space of curl-free 
flows then yields a harni(mic flow. This decomposition of pairwise ranking data into 
a global ranking component, a locally cyclic ranking component, and a harmonic 
ranking component, is called the Helmholtz decomposition. 

Consistency of the ranking data is governed to a large extent by the structure of 
its pairwise comparison graph; this is in turn revealed in the Helmholtz decompo- 
sition associated with the graph Helmholtzian, the combinatorial Laplacian of the 
3-clique complex. The sparsity structure of a pairwise comparison graph imposes 
certain constraints on the topology and geometry of its clique complex, which in 
turn decides the properties of our statistical ranking algorithms. 

In addition one may use an Zi-approximate sparse cyclic rankings to identify 
conflicts among voters. The /i-minimization problem for this has a dual given by 
correlation maximization over bounded curl-free flows. On the other hand, the li- 
projection on the gradient flows, which we view as a robust variant of the Z2-version, 
has a dual given by correlation maximization over bounded cyclic flows. 

Our results suggest that combinatorial Hodge theory could be a promising tool 
for the statistical analysis of ranking, especially for datasets with cardinal, incom- 
plete, and imbalanced information. 
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